The Biotea Project

The World Wide Web has become a dissemination platform for scientific and non-scientific publications. However, most of the information remains locked up in discrete documents that are not always interconnected or machine-readable. The connectivity tissue provided by RDF technology has not yet been widely used to support the generation of self-describing, machine-readable documents. This is our approach to the generation of self-describing machine-readable scholarly documents. We understand the scientific document as an entry point and interface to the Web of Data. We have semantically processed the full-text, open-access subset of PubMed Central. Our RDF model and resulting dataset make extensive use of existing ontologies and semantic enrichment services.


  • Garcia A, Lopez F, Garcia L, Giraldo O, Bucheli V, Dumontier M. 2018. Biotea: semantics for Pubmed Central. PeerJ 6:e4201
  • Garcia Castro LJ, McLaughlin C, Garcia A. 2013. Biotea: RDFizing PubMed Central in support for the paper as an interface to the Web of Data. Journal of Biomedical Semantics 2013 4 (Suppl 1):S5


For dataset citation, please find the dataset you want to cite at the datasets section