Having battled through that lot, the conscientious aggregator writer hits the next big hurdle: Approximately 10% of RSS feeds are badly formed XML! This issue is covered by Mark Pilgrim in Parsing RSS at all costs where he presents an ultra liberal Python RSS parser which uses Python's relatively forgiving
sgmllibmodule. Great, except PHP doesn't have one of those... enter REX, a technique for "shallow parsing" of XML using regular expressions (no, it's not as cludgy as it sounds - in fact Python's
sgmllibmodule is built on the same principles). Martin Spernau has an excellent article showing how REX can be implemented in PHP and demonstrates the technique in a modified version of the MagpieRSS library. Of course, XML purists (with very good reason) advocate ignoring badly formed feeds but as Mark points out, this really isn't a very practical approach.
similar entries (vs):
similar entries (cg):