FREE ELECTRONIC LIBRARY - Dissertations, online materials

Pages:   || 2 | 3 | 4 | 5 |

«1. Introduction An ontology is a formal conceptualization of some domain of interest. Ontologies are increasingly used for organizing information in ...»

-- [ Page 1 ] --

Corpus-based Terminological Evaluation of


Marco Rospocher ∗, Sara Tonelli, Luciano Serafini and Emanuele Pianta

Fondazione Bruno Kessler-irst

Via Sommarive 18 Povo

I-38123, Trento, Italy

E-mail: {rospocher,satonelli,serafini,pianta}@fbk.eu

Abstract. We present a novel system for corpus-based terminological evaluation of ontologies. Starting from the assumption that

a domain of interest can be represented through a corpus of text documents, we first extract a list of domain-specific key-concepts from the corpus, rank them by relevance, and then apply various evaluation metrics to assess the terminological coverage of a domain ontology with respect to the list of key-concepts.

Among the advantages of the proposed approach, we remark that the framework is highly automatizable, requiring little human intervention. The evaluation framework is made available online through a collaborative wiki-based system, which can be accessed by different users, from domain experts to knowledge engineers.

We performed a comprehensive experimental analysis of our approach, showing that the proposed ontology metrics allow for assessing the terminological coverage of an ontology with respect to a given domain, and that our framework can be effectively applied to many evaluation-related scenarios.

Keywords: Corpus based ontology evaluation, terminological ontology evaluation, key-concept extraction, ontology building environment

1. Introduction An ontology is a formal conceptualization of some domain of interest. Ontologies are increasingly used for organizing information in several application fields, including among others the Semantic Web, knowledge representation and management, biomedical informatics, software engineering, and enterprise management. Several methodologies and tools are available to support building and developing ontologies (see Gómez-Pérez et al. (2004) for a detailed overview of the Ontology Engineering field).

As any engineering artifact, an ontology needs to undergo some exhaustive evaluation, for example to understand whether it adequately describes a given domain of interest, or to check whether it is formally correct. Ontology evaluation is the task of investigating the quality of an ontology. The investigation can concern different levels (as summarized in the survey provided in Brank et al. (2005)), such as the terminological level1 (“Does the ontology represent the relevant terms of the domain of interest?”), the syntactic level (“Does the ontology match the syntactic requirements of the formal language adopted?”), the hierarchical or taxonomical level (“Does the ontology structurally fit the domain of interest?”), and the semantic level (“Does the underlying semantic model in the ontology correctly represent the domain of interest?”).

The contribution presented in this paper concerns the terminological level, since it aims at assessing whether an ontology adequately covers the domain of interest, i.e. whether the concepts used in the ontology comprehensively represent the relevant terms of a domain.

More specifically, we present a framework for the corpus-based terminological evaluation of ontologies, where an ontology is terminologically evaluated against a text corpus representative of the domain of interest. The approach is based on the extraction of a list of relevant concepts (aka key-concepts) from * Corresponding author: Marco Rospocher, rospocher@fbk.eu.

1 We prefer to adopt the term terminological in place of lexical/vocabulary used in Brank et al. (2005).

2 Rospocher et al. / Corpus-based Terminological Evaluation of Ontologies a domain corpus, ranked according to their relevance, and a matching-based comparison (aka matching) between the concepts formalized in the ontology and the extracted key-concepts. To obtain a more accurate result, the matching relies on synonymy information available in WordNet (Fellbaum, 1998). Based on the resulting matching, several evaluation metrics are defined to assess whether the given ontology adequately covers the terminology of the domain described by the text corpus.

The terminological evaluation of ontologies has been widely studied in the literature (we refer the reader to Section 2 for an overview of available proposals, and a comprehensive comparison between them and our approach); nevertheless our contribution is novel under several aspects, and presents many advantages

over other state of the art proposals:

High level of automation: human intervention is limited to the selection of the reference corpus and, possibly, to the tuning of the terminology extraction module. There is not need for a manually built gold standard;

On-line evaluation environment: a collaborative system fully implementing the proposed evaluation framework has been developed and made publicly available2. Therefore, users can exploit our framework to terminologically evaluate any OWL ontology against any text corpus (several popular digital text file formats are supported);

Domain independence: domain-specific language resources are not required;

Weighted coverage assessment: thanks to the relevance-based ordering of the key-concepts extracted from the corpus, the methodology can be applied to assess whether the most important concepts in the domain (as opposed to the marginal ones) are covered by the ontology.

We perform a comprehensive experimental analysis of our approach, showing that the evaluation metrics proposed appropriately capture the terminological adequacy of an ontology with respect to a domain.

Such metrics can be employed also to effectively and efficiently rank candidate ontologies according to how they terminologically cover a given domain, or to understand which are domain-wise the most relevant concepts formalized in an ontology.

We remark that the contribution here presented allows to evaluate the terminological level of the ontology, i.e. whether the terms used as concepts in the ontology are the relevant terms of a domain of interest, while it does not deal with the evaluation of the semantic level of the ontology, that is whether the axiomatization of the domain encoded in the ontology (i.e. the OWL axioms characterizing the concepts and properties in the ontology) is correct and complete.

The paper is structured as follows. In Section 2 we present a comprehensive overview of available proposals for ontology evaluation at terminological level. In Section 3 we describe our corpus-based ontology evaluation framework together with the ontology metrics we propose to adopt, while in Section 4 we describe the collaborative system we developed to implement the proposed approach. Furthermore, in Section 5 we detail some application scenarios in which our corpus-based evaluation framework can be effectively applied. In Section 6 we report the detailed experimental analysis that we performed to evaluate our framework, while in Section 7 some limitations of our approach are discussed. Finally, we draw some conclusions in Section 8 and present future research directions we plan to undertake.

2. Related work

Ontology evaluation can be based on different approaches. One of them is the manual revision by experts, which however has several drawbacks, being time-consuming and sensitive to the subjective nature of human interpretation and judgement. Some tools have been developed to support the user in manual ontology revision by assigning weights and values to the dimensions characterizing an ontology, for example the OntoMetric Tool (Lozano-Tello and Gómez-Pérez, 2004) and the COAT tool (Bolotnikova et al., 2011). The latter is focused on the evaluation of the cognitive ergonomicity of ontologies, i.e. on aspects concerning the human speed of perception and the cognitive soundness.

2 To the best of our knowledge, this is the first collaborative system of this kind made publicly available 3 Rospocher et al. / Corpus-based Terminological Evaluation of Ontologies As for automatic evaluation, some attempts have been made to define appropriate standards and requirements. A well-studied approach is the evaluation of the ontology against a reference ontology, aka gold

standard. Many metrics have been proposed to compare ontologies both at lexical and conceptual level:

Maedche and Staab (2002) measure both the lexical overlap between concept names and the taxonomic structure of two ontologies in an empirical study on the tourism domain, while Dellschaft and Staab (2006) suggest a number of criteria for evaluation as well as several measures of similarity. Note that although the comparison between the evaluated ontology and the gold standard can be easily automatized, the building of the gold standard is still manual.

A further approach for ontology evaluation is application-based in that it measures the quality of an ontology based on the improvement achieved by an application that is built upon it. Porzel and Malaka (2004), for example, evaluate the accuracy of an ontology by integrating it in a system for relation tagging.

In this work, we propose a methodology to evaluate an ontology based on a domain corpus. Few works go in this direction, and no evaluation system has been made available so far. In Brewster et al. (2004) the authors present a data-driven methodology for evaluating an ontology by comparing it with a corpus representing the domain area. This approach is the most similar to ours, since it is based upon the same principle, i.e. a domain corpus can be used as a starting point to evaluate the terminological adequacy of an ontology representing the knowledge of the same domain. The authors present a first evaluation methodology based on a vector space representation of the terms shared by an ontology and a corpus. However, the corpus is built by collecting 41 arbitrary texts from the Internet concerning arts and artists, therefore it cannot be seen as a reference corpus for a domain. Besides, none of the five ontologies compared to the corpus has been independently evaluated, so no real evidence of the efficacy of this evaluation approach is given. The authors also present a more sophisticated methodology, proposing to measure the “fit” between an ontology and the corpus as the conditional probability of the ontology given a corpus. Although the approach seems very interesting, neither related experiments nor evaluation are reported.

Jones and Alani (2006) present a methodology inspired by Brewster et al. (2004), but select the corpus based on a Google query extended with WordNet terms. Tf-Idf (Term frequency / Inverse document frequency) is then applied to the corpus in order to extract the top 50 potential concept labels to match against the ontology. The authors show that their approach can be applied to rank 10 candidate ontologies according to the corpus domain, with high correlation with human judgement. However, their evaluation is focused only on the ranking and no attempts are made to find a relevance score that represents in absolute terms the quality of the ontology with respect to the domain. Besides, both the corpus creation and the term extraction are quite simplistic and may require some further refinement.

More recently, Yao et al. (2011) present a methodology to benchmark an ontology against a reference corpus by first mapping concepts and relations to the corpus using NLP (Natural Language Processing) tools, and then estimating concept- and relation-specific frequency parameters to compute several similarity metrics between the ontology and the corpus. The authors rank five medical ontologies with respect to a medical corpus by taking into account precision and recall as well as the theoretical coverage and parsimony of ontology’s concepts. The metrics rely on the complete ontology created by incorporating all concepts and relations found in the reference corpus, that represents some kind of gold standard. However, the process to create this complete ontology applies only to the medical domain, since it is based on the UMLP MetaMap. Our approach, instead, relies on a general purpose methodology, and the available system is able to deliver an evaluation for any ontology and domain corpus, given that they are in a suitable format.

Cui (2010) compares coverage, semantic consistency, and agreement of four plant character ontologies by checking them against domain literature. However, the approach has been developed for the biodiversity domain, and could not be applied to other domains, especially the semantic annotation algorithm used to extract character states.

4 Rospocher et al. / Corpus-based Terminological Evaluation of Ontologies

3. The Approach

In our evaluation framework, we assume that the knowledge domain that should be encoded in the ontology is represented through a domain corpus, and that the evaluation should output some measures that express the coverage and the adequacy of the ontology with respect to such domain. This is similar to the scenario presented by Brewster et al. (2004) and Cui (2010). For example, the corpus could consist of a document describing a certain knowledge field, or a collection of articles concerning a specific topic.

Note that our approach works both with a corpus containing multiple documents and one formed by only a single (possibly long) text.

Given a corpus, the evaluation process is based on three steps, all performed in a pipeline without the

need of human intervention:

1. Key-concept extraction: Extraction of a ranked list of key-concepts from the corpus. Some manual tuning of the extraction algorithm is possible but not necessary;

2. Enrichment with external resources: Enrichment of the ranked list with additional information (synonyms) from external resources (e.g. WordNet);

3. Matching & evaluation: Alignment between the ontology and the enriched ranked list of keyconcepts, and computation of some ontology metrics based on these alignments.

A graphical representation of the workflow is displayed in Fig. 1. The single steps are detailed in the following subsections.

–  –  –

3.1. Key-concept extraction The first step aims at acquiring the terminology in the specific domain of interest, which is often seen as a useful starting point for supporting the creation of a domain ontology (see for example Liddle et al.

Pages:   || 2 | 3 | 4 | 5 |

Similar works:

«UNITED STATES MINT Report to Congress on Operations From July 1 through September 30, 2008 Fourth Quarter Fiscal Year 2008 Fourth Quarter Fiscal Year (FY) 2008 Financials: FY 2008 fourth quarter total revenue increased 22 percent from the same quarter last year. Circulating revenue decreased by 22 percent, and numismatic revenue decreased by 21 percent, while bullion revenue increased by 321 percent compared to fourth quarter FY 2007 revenues. FY 2008 year-to-date total revenue was six percent...»

«VIVA NELL’AMORE Madre CLAUDIA RUSSO VIVA NELL’AMORE La nostra venerata Madre, Claudia Russo, fu Ernesto e fu Nicoletti Rosa, nacque in Barra (Napoli) il 18-11-1889 da piissimi genitori. Il padre, capotecnico dell'Arsenale di Artiglieria di Napoli, di carattere forte, modello di educatore cristiano, ebbe per i suoi figli solerte vigilanza e procurò loro una soda cultura in-tellettuale, morale e religiosa. Egli ebbe occasione di conoscere Monsignore Gioacchino Brandi, Direttore della Scuola...»

«Ministering Angels Matthew 4: 11 Items Needed ! Bible with marked scripture ! Crayons/Markers/Pencils ! Copies of Coloring Page ! Copies of Word Search ! Copies of take home overview sheet Ministering Angels Objective of Lesson What are angels and what do they do? The objective of this lesson is to examine the depth and length that our glorious God has gone to ensure the salvation of His elect people. All Glory and Praise and Honor to Christ our Lord be given. Amen. Scripture Reference Matthew...»

«March 2015 RONALD R. RINDFUSS Carolina Population Center Research Program University of North Carolina East-West Center 206 W. Franklin St., Room 208 1601 East-West Road Chapel Hill, NC 27516 Honolulu, HI 96848-1601 919-962-3532 808-944-7402 Education: B.A., 1968, Fordham University, New York, New York. Majored in sociology, minored in mathematics. Ph.D., 1974, Princeton University, Princeton, New Jersey, sociology. Dissertation topic: Measurement of Personal Fertility Preferences. POSITIONS...»

«HUNGARIAN GEOGRAPHICAL BULLETIN 2013 Volume 62 Number 3 CONTENT Studies Zoltán Szalai, János Balogh and Gergely Jakab: Riverbank erosion in Hungary – with an outlook on environmental consequences Krisztina Babák, Ibolya Kiss, Zsanett Kopecskó, Péter István Kovács and Ferenc Schweitzer: Regeneration process of the karst water springs in Transdanubian Mountains, Hungary Gabe Harrach: The demographic role of religion in Hungary Fertility of denominations at the beginning of the 20th...»

«PRODUCT MONOGRAPH Pr FLOVENT® HFA fluticasone propionate inhalation aerosol 50, 125, and 250 mcg/metered dose Pr FLOVENT® DISKUS® fluticasone propionate powder for inhalation 100, 250, and 500 mcg/blister Corticosteroid for Oral Inhalation Date of Revision: GlaxoSmithKline Inc. June 1, 2016 7333 Mississauga Road Mississauga, Ontario L5N 6L4 www.gsk.ca Submission Control No: 185395 © 2016 GlaxoSmithKline Inc. All rights reserved. FLOVENT and DISKUS are registered trademarks of Glaxo Group...»

«THE NONTIMBER VALUES OF TROPICAL FORESTS by Norman Myers WORKING PAPER 10 FORESTRY FOR SUSTAINABLE DEVELOPMENT PROGRAM Department of Forest Resources College of Natural Resources University of Minnesota 1530 N. Cleveland Avenue St. Paul, Minnesota 55108 November 1990PREFACE BACKGROUND RESERVES TRADrI'IONAL FOREST LAND AGROECOSYSIEM!j VALUEOFBIODIVERSITY BUFFER ZONES AS A FORESTSAFEGUARD MEASURE. 10 REFERENCES PREFACE The author of this paper, Dr. Norman Myers, is an associate in the Forestry...»

«People's Democratic Republic of Algeria Ministry of Higher Education and Scientific Research Mentouri University Constantine Faculty of Letters and Languages Department of English Flouting Grice’s Maxims _ A Pragmatic Study_ Case Study; Mentouri University, Constantine Dissertation Submitted in Partial Fulfilment of the Requirements for the “Master” Degree in Applied Language Studies. Board of Examiners President: Dr. Saleh KAOUACHE University of Constantine Supervisor: Dr. Riad BELOUAHEM...»

«The Relevance and Transferability of Design Codes for Slum Upgrading The Case of Kisumu, Kenya by Patrick Kenneth Wetter B.E.S., University of Waterloo, 2003 Research Project Submitted In Partial Fulfillment of the Requirements for the Degree of Master of Arts in the School for International Studies Faculty of Arts and Social Sciences © Patrick Kenneth Wetter 2013 SIMON FRASER UNIVERSITY Summer 2013 Approval Name: Patrick Kenneth Wetter Degree: Master of Arts (International Studies) Title of...»

«INDIAN INSTITUTE OF MANAGEMENT AHMEDABAD INDIA Research and Publications Structuring PPPs in Aviation Sector: Case of Delhi and Mumbai Airport Privatization Ajay Pandey Sebastian Morris G. Raghuram W.P. No. 2010-11-03 November 2010 The main objective of the working paper series of the IIMA is to help faculty members, research staff and doctoral students to speedily share their research findings with professional colleagues and test their research findings at the pre-publication stage. IIMA is...»

«MINUTES OF THE REGULAR MEETING CITY COUNCIL LITTLE CANADA, MINNESOTA OCTOBER 14, 2015 Pursuant to due call and notice thereof a regular meeting of the City Council of Little Canada, Minnesota was convened on the 14th day of October, 2015 in the Council Chambers of the City Center located at 515 Little Canada Road in said City. Mayor John Keis called the meeting to order at 7:30 p.m. and the following members of the City Council were present at roll call: CITY COUNCIL: Mayor Mr. John Keis...»

«postadres anschrift euregio, Enscheder Straße 362, 48599 Gronau Postbus 6008 Postfach 1164 Aan de leden en adviserende leden NL-7503 GA D-48572 van de EUREGIO-raad Enschede Gronau 053-4605151 02562 / 702-0 053-4605159 02562 / 702-59 info@euregio.nl info@euregio.de www.euregio.nl www.euregio.de Ansprechpartner/Contactpersoon M.Veelers@euregio.eu Maaike Veelers 24 Betreft: Vergadering EUREGIO-raad d.d. 27 november 2015 in 12 november 2015 Losser Geachte dames en heren, De eerstvolgende...»

<<  HOME   |    CONTACTS
2016 www.dissertation.xlibx.info - Dissertations, online materials

Materials of this site are available for review, all rights belong to their respective owners.
If you do not agree with the fact that your material is placed on this site, please, email us, we will within 1-2 business days delete him.