RegEx needed for Wikipedia infobox

Dr.Kameleon Source

OK, so here's what I need :

  • We have the full XML of a Wikipedia article
  • We need just the Infobox section

I have tried various things, but my main issue seems to be not being able to matching "internal" curly brackets. Any ideas (or any regex you have managed to get this done?)

For those of you who do not know what I'm talking about, here's a (somewhat abridged) example of what I'm trying to parse : http://regexr.com?38299

(What is needed is the part between {{Infobox ******* up to its corresponding closing brackets (}}).

phpregexwikipediawikipedia-api

Answers

answered 4 years ago MElliott #1

Ok, I got it!

Try this..:

(?=\{Infobox)(\{([^{}]|(?1))*\})

Here's the working example:

http://regex101.com/r/kT1jF4

comments powered by Disqus