biotea-ontololgy
An ontology to support the RDFization of scientific documents in the Biotea project. Specifically, it supports group-based distribution and semantic similarity.
Table of Contents
Background
Semantic similarity
Similarity between a query document and a related document taking into account semantic annotations identified in the text. A semantic annotation is a word or set of words associated to an ontological term. The similarity can be restricted to some particular groups defined in a model.
Group-based distribution
Distribution of all the terms in a document according to a model describing a set of groups. Every group gets a score [0.0, 1.0] and the summation of all scores is 1.0.
Semantic groups model
Both group-based distribution and semantic similarty refers to a model of semantic groups such as UMLS or Biolinks. Biolinks is a customization of UMLS semantic groups aiming for more granular groups. The groups consider in Biolinks are: ACTI (Activities & Behaviors), ANAT (Anatomy), CHEM (Chemical entities), CONC (Concepts & Ideas), DEVI (Devices), DISO (Disorders), DRUG (Drugs), GENE (Genes & Molecular Sequences), GEOG (Geographic Areas), GNPT (DNA & Protein molecules), OBJC (Objects), OBSV (Physiology attributes & processes), OCCU (Occupations), ORGA (Organizations), PEOP (People and population groups), PHEN (Phenomena), PHYS (Physiological functions), PROC (Procedures), SYMP (Disorder symptoms) and TAXA (Taxonomic terms).
Other ontologies
Our model references terms in the following ontologies:
Ontology at a glance
Semantic similarity model
Group-based distribution model
Classes
Biolink
ElementSelector
Selector used in annotations to point to elements within an RDF file. In Biotea this selector is used to link an annotation to the section or paragraph where the corresponding term was identified. |
used as domain in object properties |
ao:onResource |
used as range in object properties |
ao:context |
Model
A model of semantic groups, e.g., UMLS or Biolinks semantic groups. A set of concepts is associated to one group and only one in the model. |
used as domain in object properties |
dcterms:subject |
used as domain in data properties |
biotea:group, rdfs:label |
used as range in object properties |
biotea:hasModel |
SemanticAnnotation
A portion of a document (i.e., word or sequence of words) associated to a semantic entity (e.g., CUI in UMLS). |
used as domain in object properties |
dcterms:references |
used as domain in data properties |
biotea:idf, biotea:tf |
used as range in object properties |
biotea:link |
Topic
Group name and distribution score calculated for such a group. |
used as domain in data properties |
biotea:score, rdfs:label |
used as range in object properties |
biotea:hasTopic |
TopicDistribution
A class representing a Biolinks group-based distribution. Such a distribution is defined on a document and all Biolinks group. A score is associated to each group, representing the weight of that group in the document. |
used as domain in object properties |
biotea:annotator, biotea:hasModel, biotea:onDocument, biotea:hasTopic, pav:createdBy |
used as domain in data properties |
biotea:totalTF, pav:createdOn |
Object properties
annotator
hasModel
hasTopic
link
Link between a biotea:Biolink and a semantic annotation. Used to record the semantic annotations participating in the similarity between two documents. |
declared domains |
biotea:Biolink |
suggested ranges |
biotea:SemanticAnnotation |
onDocument
Points to the document for which the group-based distribution has been calculated. |
declared domains |
biotea:TopicDistribution |
onQueryDocument
Points to the query document used to calculate a semantic similarity. |
declared domains |
biotea:Biolink |
Points to the compared document used to calculate a semantic similarity. |
declared domains |
biotea:Biolink |
paragraphList
Points to a list of paragraphs in an RDFized article. |
suggested domains |
doco:Section |
suggested ranges |
rdf:Seq |
sectionList
Points to a list of sections in an RDFized article. |
suggested domains |
bibo:Document, doco:Section |
suggested ranges |
rdf:Seq |
Data type properties
group
idf
Inverse document frequency of a term in a collection of documents. |
suggested domains |
biote:SemanticAnnotation |
declared type |
xsd:Double |
occurrences
score
tf
totalTF
Total term frequency of all the terms used to calculate a TopicDistribution. |
suggested domains |
biotea:TopicDistribution |
declared type |
xsd:Integer |