Home Geschichten Kunst Computer Tindertraum


what feedster is good for
(Monday 7th April 2003)

reading an RSS search for your own blog... I'd never have catched this one, as it doesn't point to my blog, but to an article I wrote...
Having battled through that lot, the conscientious aggregator writer hits the next big hurdle: Approximately 10% of RSS feeds are badly formed XML! This issue is covered by Mark Pilgrim in Parsing RSS at all costs where he presents an ultra liberal Python RSS parser which uses Python's relatively forgiving sgmllib module. Great, except PHP doesn't have one of those... enter REX, a technique for "shallow parsing" of XML using regular expressions (no, it's not as cludgy as it sounds - in fact Python's sgmllib module is built on the same principles). Martin Spernau has an excellent article showing how REX can be implemented in PHP and demonstrates the technique in a modified version of the MagpieRSS library. Of course, XML purists (with very good reason) advocate ignoring badly formed feeds but as Mark points out, this really isn't a very practical approach.

[ by Martin>] [permalink] [similar entries]

similar entries (vs):

similar entries (cg):

relevant words

Martin Spernau
© 1994-2003

traumwind icon Big things to come (TM) 30th Dez 2002

Turn it upside down
Oblique Strategies, Ed.3 Brian Eno and Peter Schmidt

amazon.de Wunschliste


usefull links:
Google Graph browser
Traumwind 6-Colormatch
UAV News

powered by SBELT