Monday, November 07, 2005

NetSurf - Source code for browser

Right after I posted the last post about lack of open-source documentation for HTML parsing, I stumbled across this neat browser for RISC OS: NetSurf. And the full source code is neatly documented!

I plan to go over the code for NetSurf in building my new app.

libxml for parsing HTML

Long time no see. Have been busy with preparing for my EE PhD qualifiers!

Well anyway, now I am to create a HTTP-retreiver-&-parser to get files for the 7DS multicast query system. It now has to not only get the result-set for the 7DS queries, but should also get the files themselves, as well as associated elements, such as images, etc.

I found several HTML parsers for C (after long searches) such as ekhtml (nil documentation), tidy (library does not build properly) and LibWWW (supposed to be very complicated) ... and have settled on using LibXML's inbuilt HTML parsing tools.

Sad that open source code has very little documentation ... hey, but neither does 7DS yet!