REX, ported to PHP
The aproach Cameron follows here is to build one big regular expression that will match
each part of a XML document, and thereby split the document into a list of XML elements
and enclosed text.
To understand the following example, please read
the Paper "REX: XML Shallow Parsing with Regular Expressions" by Robert D. Cameron, that explains the
various parts of the regular expression being put together.
esp. the Perl code in Appendix A: Shallow Parsing in Perl
A first working approach in PHP can be found here
execute test2.php (the sample XML file here is a snapshot RSS file downloaded from Mark Pilgrim's site on 24th September. Note: this file is an example of valid, wellformed XML, so it would give any other XML parser no trouble, it's just here for an eyample of XML data)
What can be seen clearly when executing the code is the 'shallow' characteristic of this aproach. We simply get a flat list of all 'tokens' in the order of their appearance. If we were to 'join' this list again, we would have the exact data we started with, with all whitespace, indentation and linebreak intact. (This is actually an important characteristic of this aproach)
Obviously this list is not very much as of yet, and doesn't really buy us anything. What to do with this list will be discussed in the next part.
test2.php code follows:
alles Bild, Text und Tonmaterial ist © Martin Spernau, Verwendung und Reproduktion erfordert die Zustimmung des Authors