The aim of the ART project was to develop a set of content metadata CISP (the Core Information About Scientific Papers) and a tool SAPIENT (Semantic Annotation of Papers: Interface and Enrichment Tool ) to allow the annotation of scientific papers with CISP concepts. SAPIENT takes as input text in XML and adds semantic information in the form of inline markup. The papers annotated with CISP concepts, ART-corpus (200 papers, ~ 4 million words), constitute a useful resource for text mining and natural language processing research. We used an ontology methodology with its coherent logic, clear semantics and explicit definitions of the entities to develop the set of metadata CISP. Our analysis of papers was based on an ontological representation of investigations. A scientific paper is one of many ways (the most typical) of representing investigations. Our main assumption is that a scientific paper as a representation of the content of a scientific investigation needs to contain the key concepts for the description of investigations. First, we identified what concepts are essential for the description of scientific investigations. Second, we proposed a set of the most essential concepts for representing scientific papers – the Core Information about Scientific Papers (CISP). We have proposed the following eight key classes for the description of scientific papers: Goal of the investigation, Motivation, Object of the investigation, Method of the investigation, Experiment, Observation, Result, Conclusion. This approach can be applied to investigations where the research is driven by experimental methods (physically executing experiments, computationally running experiments, or theoretical experiments). So far CISP has been validated by experts in physical chemistry and by researchers from different fields who participated in an online survey. Within the ART project CISP has been applied to a set of 200 papers to verify our approach. We used the CISP classes as a basis for an annotation scheme, where concepts and their properties were combined to produce flat labels. We then developed a set of guidelines explaining the annotation labels in detail and containing useful examples to allow annotation by human experts. SAPIENT, developed within the ART project, is an annotation tool implemented as a web application, which enables experts to annotate scientific papers, sentence by sentence in combination with out annotation guidelines. SAPIENT is built as a web application, so as to make it platform independent and easier to incorporate as part of an online workflow. We have used state of the art web technologies to develop SAPIENT and the system has a client-server architecture. Within the ART project, SAPIENT has been used by 16 experts who have applied CISP to a set of 200 papers from RSC Publishing journals, in order to construct a corpus of CISP-annotated papers (ART-corpus). We worked with internal and external experts to validate CISP and the tool. We maintain a project website in order to make the project outputs open to everyone. We gave many presentations and demos of the developed tool SAPIENT and the metadata CISP. SAPIENT is designed to be general for all scientific areas, and the goal is to eventually produce an interface tool that can be used to annotate any scientific investigation. However, we had to focus on a particular scientific area – we selected physical chemistry - in order to demonstrate our approach in full detail. The content metadata CISP has the potential of being widely used for the annotation of scientific resources from biomedical domains. The CISP labels (except one) are classes from OBI (Ontology of Biomedical Investigations). OBI is quickly becoming a de facto standard for the description of biomedical resources. CISP can help to annotate such resources. The tool SAPIENT implements the annotation with CISP and has the potential to influence the publishing pipeline.
Effective start/end date01 Jun 200731 Mar 2009

  • JISC: £100,000.00


