I am given a string containing a comma-separated list of words (where whitespace and case are not significant) and I want a Perl regexp to test the following: the string contains the (complete) word "french" and the (complete) word "english" does not occur earlier. For instance, I want to accept "french", "foobar, french", "bar, french, quux, english", "french, english, french"; but reject "foo, bar", "english, french", "foo, english, bar, french, english".
My goal is to use a regexp of this kind in a lighttpd configuration. To be precise, I want to parse Accept-Language headers, with the naive heuristics that languages are listed in decreasing preference order, which is often true although not prescribed by the RFC. Hence, I can only have a Perl compatible regular expression, I cannot use any other features of Perl.
In terms of formal language theory, such a regular expression must exist, but the straightforward solution requires regexp negation, which is painful to perform. (This is why I ask the question with "french" and "english" rather than "fr" and "en", where regexp negation would be tedious but doable by hand.) Are there any Perl-specific regexp features to make it possible to write a concise regexp for my task, or is there a tool to automatically compile a regexp to perform this?regexperl
Something like this should work
Fail on first 'English' before 'French' only its:
# /(?i)^(?:(?!\benglish\b).)*?\bfrench\b/ (?i) # Case insensitive ^ # BOS (?: (?! \b english \b ) . )*? \b french \b # 'french'
Fail on any 'English' before 'French'
# /(?i)^(?!.*\benglish\b.*\bfrench\b).*\bfrench\b/ (?i) # Case insensitive ^ # BOS (?! # Not 'english' .. 'french' .* \b english \b .* \b french \b ) .* \b french \b # Must contain 'french'