Abstract
We describe an extension of the ht://Dig robot-based internet indexing and search engine to include the retrieval of information included in a variety of molecular data formats as defined by chemical MIME types. This is achieved by invoking chemical meta-parsers, software agents designed to provide key meta-data information about the content of the external chemical files. This meta-data can include, for example, derived molecular formula, molecular mass and atom connection table (SMILES) where the content of the file allows this, and other types of content such as author information and supplied keywords. These terms can be automatically added to the searchable terms, and the search outputs can be automatically linked via database requests to other external databases containing chemical information. We report our experience in applying this robot to indexing five different remote sites. We discuss different mechanisms for storing and searching for the chemical content, ranging from simple keyword-based searches qualified by chemically significant boolean terms, chemical similarity searches and our experiments in creating more highly structured content that expresses the chemical data using XML-based markup and where XSLT transforms for filtering, searching and rendering the information are used.
Original language | English |
---|---|
Pages (from-to) | 656-666 |
Number of pages | 11 |
Journal | New Journal of Chemistry |
Volume | 26 |
Issue number | 5 |
DOIs | |
Publication status | Published - 2002 |