«1. Introduction An ontology is a formal conceptualization of some domain of interest. Ontologies are increasingly used for organizing information in ...»
Corpus-based Terminological Evaluation of
Marco Rospocher ∗, Sara Tonelli, Luciano Seraﬁni and Emanuele Pianta
Fondazione Bruno Kessler-irst
Via Sommarive 18 Povo
I-38123, Trento, Italy
Abstract. We present a novel system for corpus-based terminological evaluation of ontologies. Starting from the assumption that
a domain of interest can be represented through a corpus of text documents, we ﬁrst extract a list of domain-speciﬁc key-concepts from the corpus, rank them by relevance, and then apply various evaluation metrics to assess the terminological coverage of a domain ontology with respect to the list of key-concepts.
Among the advantages of the proposed approach, we remark that the framework is highly automatizable, requiring little human intervention. The evaluation framework is made available online through a collaborative wiki-based system, which can be accessed by different users, from domain experts to knowledge engineers.
We performed a comprehensive experimental analysis of our approach, showing that the proposed ontology metrics allow for assessing the terminological coverage of an ontology with respect to a given domain, and that our framework can be effectively applied to many evaluation-related scenarios.
Keywords: Corpus based ontology evaluation, terminological ontology evaluation, key-concept extraction, ontology building environment
1. Introduction An ontology is a formal conceptualization of some domain of interest. Ontologies are increasingly used for organizing information in several application ﬁelds, including among others the Semantic Web, knowledge representation and management, biomedical informatics, software engineering, and enterprise management. Several methodologies and tools are available to support building and developing ontologies (see Gómez-Pérez et al. (2004) for a detailed overview of the Ontology Engineering ﬁeld).
As any engineering artifact, an ontology needs to undergo some exhaustive evaluation, for example to understand whether it adequately describes a given domain of interest, or to check whether it is formally correct. Ontology evaluation is the task of investigating the quality of an ontology. The investigation can concern different levels (as summarized in the survey provided in Brank et al. (2005)), such as the terminological level1 (“Does the ontology represent the relevant terms of the domain of interest?”), the syntactic level (“Does the ontology match the syntactic requirements of the formal language adopted?”), the hierarchical or taxonomical level (“Does the ontology structurally ﬁt the domain of interest?”), and the semantic level (“Does the underlying semantic model in the ontology correctly represent the domain of interest?”).
The contribution presented in this paper concerns the terminological level, since it aims at assessing whether an ontology adequately covers the domain of interest, i.e. whether the concepts used in the ontology comprehensively represent the relevant terms of a domain.
More speciﬁcally, we present a framework for the corpus-based terminological evaluation of ontologies, where an ontology is terminologically evaluated against a text corpus representative of the domain of interest. The approach is based on the extraction of a list of relevant concepts (aka key-concepts) from * Corresponding author: Marco Rospocher, email@example.com.
1 We prefer to adopt the term terminological in place of lexical/vocabulary used in Brank et al. (2005).
2 Rospocher et al. / Corpus-based Terminological Evaluation of Ontologies a domain corpus, ranked according to their relevance, and a matching-based comparison (aka matching) between the concepts formalized in the ontology and the extracted key-concepts. To obtain a more accurate result, the matching relies on synonymy information available in WordNet (Fellbaum, 1998). Based on the resulting matching, several evaluation metrics are deﬁned to assess whether the given ontology adequately covers the terminology of the domain described by the text corpus.
The terminological evaluation of ontologies has been widely studied in the literature (we refer the reader to Section 2 for an overview of available proposals, and a comprehensive comparison between them and our approach); nevertheless our contribution is novel under several aspects, and presents many advantages
over other state of the art proposals:
High level of automation: human intervention is limited to the selection of the reference corpus and, possibly, to the tuning of the terminology extraction module. There is not need for a manually built gold standard;
On-line evaluation environment: a collaborative system fully implementing the proposed evaluation framework has been developed and made publicly available2. Therefore, users can exploit our framework to terminologically evaluate any OWL ontology against any text corpus (several popular digital text ﬁle formats are supported);
Domain independence: domain-speciﬁc language resources are not required;
Weighted coverage assessment: thanks to the relevance-based ordering of the key-concepts extracted from the corpus, the methodology can be applied to assess whether the most important concepts in the domain (as opposed to the marginal ones) are covered by the ontology.
We perform a comprehensive experimental analysis of our approach, showing that the evaluation metrics proposed appropriately capture the terminological adequacy of an ontology with respect to a domain.
Such metrics can be employed also to effectively and efﬁciently rank candidate ontologies according to how they terminologically cover a given domain, or to understand which are domain-wise the most relevant concepts formalized in an ontology.
We remark that the contribution here presented allows to evaluate the terminological level of the ontology, i.e. whether the terms used as concepts in the ontology are the relevant terms of a domain of interest, while it does not deal with the evaluation of the semantic level of the ontology, that is whether the axiomatization of the domain encoded in the ontology (i.e. the OWL axioms characterizing the concepts and properties in the ontology) is correct and complete.
The paper is structured as follows. In Section 2 we present a comprehensive overview of available proposals for ontology evaluation at terminological level. In Section 3 we describe our corpus-based ontology evaluation framework together with the ontology metrics we propose to adopt, while in Section 4 we describe the collaborative system we developed to implement the proposed approach. Furthermore, in Section 5 we detail some application scenarios in which our corpus-based evaluation framework can be effectively applied. In Section 6 we report the detailed experimental analysis that we performed to evaluate our framework, while in Section 7 some limitations of our approach are discussed. Finally, we draw some conclusions in Section 8 and present future research directions we plan to undertake.
2. Related work
Ontology evaluation can be based on different approaches. One of them is the manual revision by experts, which however has several drawbacks, being time-consuming and sensitive to the subjective nature of human interpretation and judgement. Some tools have been developed to support the user in manual ontology revision by assigning weights and values to the dimensions characterizing an ontology, for example the OntoMetric Tool (Lozano-Tello and Gómez-Pérez, 2004) and the COAT tool (Bolotnikova et al., 2011). The latter is focused on the evaluation of the cognitive ergonomicity of ontologies, i.e. on aspects concerning the human speed of perception and the cognitive soundness.
2 To the best of our knowledge, this is the ﬁrst collaborative system of this kind made publicly available 3 Rospocher et al. / Corpus-based Terminological Evaluation of Ontologies As for automatic evaluation, some attempts have been made to deﬁne appropriate standards and requirements. A well-studied approach is the evaluation of the ontology against a reference ontology, aka gold
standard. Many metrics have been proposed to compare ontologies both at lexical and conceptual level:
Maedche and Staab (2002) measure both the lexical overlap between concept names and the taxonomic structure of two ontologies in an empirical study on the tourism domain, while Dellschaft and Staab (2006) suggest a number of criteria for evaluation as well as several measures of similarity. Note that although the comparison between the evaluated ontology and the gold standard can be easily automatized, the building of the gold standard is still manual.
A further approach for ontology evaluation is application-based in that it measures the quality of an ontology based on the improvement achieved by an application that is built upon it. Porzel and Malaka (2004), for example, evaluate the accuracy of an ontology by integrating it in a system for relation tagging.
In this work, we propose a methodology to evaluate an ontology based on a domain corpus. Few works go in this direction, and no evaluation system has been made available so far. In Brewster et al. (2004) the authors present a data-driven methodology for evaluating an ontology by comparing it with a corpus representing the domain area. This approach is the most similar to ours, since it is based upon the same principle, i.e. a domain corpus can be used as a starting point to evaluate the terminological adequacy of an ontology representing the knowledge of the same domain. The authors present a ﬁrst evaluation methodology based on a vector space representation of the terms shared by an ontology and a corpus. However, the corpus is built by collecting 41 arbitrary texts from the Internet concerning arts and artists, therefore it cannot be seen as a reference corpus for a domain. Besides, none of the ﬁve ontologies compared to the corpus has been independently evaluated, so no real evidence of the efﬁcacy of this evaluation approach is given. The authors also present a more sophisticated methodology, proposing to measure the “ﬁt” between an ontology and the corpus as the conditional probability of the ontology given a corpus. Although the approach seems very interesting, neither related experiments nor evaluation are reported.
Jones and Alani (2006) present a methodology inspired by Brewster et al. (2004), but select the corpus based on a Google query extended with WordNet terms. Tf-Idf (Term frequency / Inverse document frequency) is then applied to the corpus in order to extract the top 50 potential concept labels to match against the ontology. The authors show that their approach can be applied to rank 10 candidate ontologies according to the corpus domain, with high correlation with human judgement. However, their evaluation is focused only on the ranking and no attempts are made to ﬁnd a relevance score that represents in absolute terms the quality of the ontology with respect to the domain. Besides, both the corpus creation and the term extraction are quite simplistic and may require some further reﬁnement.
More recently, Yao et al. (2011) present a methodology to benchmark an ontology against a reference corpus by ﬁrst mapping concepts and relations to the corpus using NLP (Natural Language Processing) tools, and then estimating concept- and relation-speciﬁc frequency parameters to compute several similarity metrics between the ontology and the corpus. The authors rank ﬁve medical ontologies with respect to a medical corpus by taking into account precision and recall as well as the theoretical coverage and parsimony of ontology’s concepts. The metrics rely on the complete ontology created by incorporating all concepts and relations found in the reference corpus, that represents some kind of gold standard. However, the process to create this complete ontology applies only to the medical domain, since it is based on the UMLP MetaMap. Our approach, instead, relies on a general purpose methodology, and the available system is able to deliver an evaluation for any ontology and domain corpus, given that they are in a suitable format.
Cui (2010) compares coverage, semantic consistency, and agreement of four plant character ontologies by checking them against domain literature. However, the approach has been developed for the biodiversity domain, and could not be applied to other domains, especially the semantic annotation algorithm used to extract character states.
4 Rospocher et al. / Corpus-based Terminological Evaluation of Ontologies
3. The Approach
In our evaluation framework, we assume that the knowledge domain that should be encoded in the ontology is represented through a domain corpus, and that the evaluation should output some measures that express the coverage and the adequacy of the ontology with respect to such domain. This is similar to the scenario presented by Brewster et al. (2004) and Cui (2010). For example, the corpus could consist of a document describing a certain knowledge ﬁeld, or a collection of articles concerning a speciﬁc topic.
Note that our approach works both with a corpus containing multiple documents and one formed by only a single (possibly long) text.
Given a corpus, the evaluation process is based on three steps, all performed in a pipeline without the
need of human intervention:
1. Key-concept extraction: Extraction of a ranked list of key-concepts from the corpus. Some manual tuning of the extraction algorithm is possible but not necessary;
2. Enrichment with external resources: Enrichment of the ranked list with additional information (synonyms) from external resources (e.g. WordNet);
3. Matching & evaluation: Alignment between the ontology and the enriched ranked list of keyconcepts, and computation of some ontology metrics based on these alignments.
A graphical representation of the workﬂow is displayed in Fig. 1. The single steps are detailed in the following subsections.
3.1. Key-concept extraction The ﬁrst step aims at acquiring the terminology in the speciﬁc domain of interest, which is often seen as a useful starting point for supporting the creation of a domain ontology (see for example Liddle et al.