PHP regex - find matches not between specific tags

Tamara Source

Here is an example of string I want to parse with PHP regular expressions:

this is first %@ variable <bpt>inside tags %@ variable</bpt> trailing %@ variable

What I need to match is %@ sequence NOT between <bpt> and </bpt>. So for this string pattern should return 2 matches.

This is what I have so far:

%@(?!(?!<bpt).*\/bpt)

It doesn't work as expected and returns only last appearance of the %@. In regex pattern I want to check that there is no </bpt> closing tag after the match, but case of the <bpt> ... </bpt> after match should be allowed.

Link to regex101.

phpregex

Answers

answered 2 months ago sumit #1

This is my solution, check the comments for explanation

$str="this is first %@ variable1 <bpt>inside tags %@ variable</bpt> trailing %@ variable2 %@";
//strip put all contents inside <bpt>
$content = preg_replace('/<bpt>[^<]+<\/bpt>/i', '', $str);
//split string to words 
$arr=explode(" ",$content);
//use array map for condition
//check for %@ and return preceding element after that 
$variable_only=array_map(function ($a,$k)use($arr) { if($a==='%@') {return isset($arr[$k+1]) ? $arr[$k+1] :'' ; } }, $arr,array_keys($arr));
//remove blank arrays and reset keys
$variable_only=array_values(array_filter($variable_only));
print_r($variable_only);

output

Array ( [0] => variable1 [1] => variable2 )

answered 2 months ago revo #2

You have to change your regex a little bit:

(?s)%@(?!(?:(?!<bpt>).)*<\/bpt>)

Live demo

Breakdown:

(?s) # Enable DOTALL flag
%@ # Match `%@`
(?! # A negative lookahead that means preceding match
    # shouldn't come with next patterns which say:
    (?:(?!<bpt>).)* # Without matching `<bpt>`
    <\/bpt> # Match `</bpt>`
) # End of lookahead

But there is also a more optimal approach. Since PHP (PCRE) is being used, you could use a backtracking verb named SKIP:

<bpt>.*?<\/bpt>(*SKIP)(*F)|%@

Live demo

This way you match an entire bpt tag (asap) then tell engine to skip over and try another path.

comments powered by Disqus