A robot-based procedure is described for traversing a collection of hyperlinked documents written in HTML and converting these to the XML-compliant and well-formed XHTML representation. Transcluded chemical content invoked using <embed> or <applet> HTML calls are converted to the XHTML recommended <object> form. Additional attributes such as title or derived chemical attributes such as a SMILES descriptor are added to improve the indexing of the resulting document collection. Conformance tests for the popular Web browsers are reported.
|Number of pages||6|
|Journal||Journal of Chemical Information and Computer Sciences|
|Early online date||26 Mar 2001|
|Publication status||Published - 2001|