# «Abstract. We consider the problem of computing the logical diﬀer- ence between distinct versions of description logic terminologies. For the ...»

The Logical Diﬀerence Problem for Description

Logic Terminologies

Boris Konev, Dirk Walther, and Frank Wolter

University of Liverpool, Liverpool, UK

{konev, dwalther, wolter}@liverpool.ac.uk

Abstract. We consider the problem of computing the logical diﬀer-

ence between distinct versions of description logic terminologies. For the

lightweight description logic EL, we present a tractable algorithm which,

given two terminologies and a signature, outputs a set of concepts, which

can be regarded as the logical diﬀerence between the two terminologies.

As a consequence, the algorithm can also decide whether they imply the same concept implications in the signature. A prototype implementa- tion CEX of this algorithm is presented and experimental results based on distinct versions of Snomed ct, the Systematized Nomenclature of Medicine, Clinical Terms, are discussed. Finally, results regarding the re- lation to uniform interpolants and possible extensions to more expressive description logics are presented.

1 Introduction The standard diﬀ operation for text ﬁles is an indispensable tool for comparing diﬀerent versions of texts, and similar operations are available to software en- gineers comparing distinct versions of code produced in collaborative software projects. As observed, e.g., in [14], such a purely syntactic diﬀ operation is hardly useful if the text consists of a set of axioms of an ontology. In this case, one is usually not interested in a comparison of the syntactic form of axioms, but in the consequences that the ontologies have. The authors of [14] present a number of heuristic rules to address this problem and develop a diﬀ operator for ontolo- gies. Except theoretical results in [12, 13, 9], we are not aware of any logic-based approach to computing the logical diﬀ of ontologies.

Our formalisation of the logical diﬀerence problem is based on the observation that when comparing distinct versions of ontologies one should take into account their signatures. In fact, the interesting diﬀerences between ontologies are those formulated in their shared signature (or even subsets thereof), and not those involving symbols used only in one of the two ontologies. Thus, the proposed notion of logical diﬀerence is based on the notion of Σ-entailment: an ontology T Σ-entails an ontology T for a signature Σ, if for all concept implications D in Σ, T |= C D implies T |= C D. If T and T mutually Σ- C entail each other, then they are called Σ-inseparable. By taking Σ as the set of shared symbols of T and T, Σ-inseparability means that T and T are not distinguishable by means of concept implications in their shared signature. In this case, their logical diﬀerence will be regarded as empty.

We show that deciding Σ-entailment is tractable for EL-terminologies, i.e., sets of possibly cyclic concept deﬁnitions in the lightweight description logic EL; see [1, 10]. Observe that for ontologies formulated as general TBoxes in description logics, the computational complexity of deciding Σ-entailment is by at least one exponential harder than the deduction problem, e.g., it is 2ExpTimecomplete for expressive description logics such as ALC, ALCQ, and ALCQI [6, 12] and ExpTime-complete for EL itself [13]. Moreover, even in such simple formalisms as acyclic propositional Horn Logic Σ-entailment is co-NP-complete [5].

In applications, it is not enough to decide whether two ontologies are logically diﬀerent, but an informative list of diﬀerences is required. We show that D in the logical diﬀerence between two ELfor any concept implication C terminologies, there exist subconcepts C and D of C and D, respectively, such that C D is in the logical diﬀerence and C or D is a concept name. Thus, listing the set of all concept names involved in such implications appears to be an informative approximation of the logical diﬀerence between two EL-terminologies.

This list is empty if, and only if, there is no logical diﬀerence between the two terminologies.

The system CEX implements, by employing a dynamic programming approach, the algorithm deciding Σ-entailment and lists the set of logical differences described above for acyclic EL-terminologies. We present a variety of experiments in which CEX is applied to diﬀerent versions of Snomed ct, the Systematized Nomenclature of Medicine, Clinical Terms. This terminology comprises ∼0.4 million terms and underlies the systematised medical terminology used in the health systems of the US, the UK, and other countries [17].

Finally, we discuss an alternative approach to deciding Σ-entailment using uniform interpolants and explore the complexity of corresponding reasoning problems for acyclic ALC-terminologies.

Detailed proofs are provided in the technical report [11].

**2 Preliminaries**

Let NC and NR be countably inﬁnite and disjoint sets of concept names and role names, respectively. In the description logic EL, concepts C are built according to the syntax rule C ::= | A | C D | ∃r.C, where A ranges over NC, r ranges over NR, and C, D range over concepts. The semantics of concepts is deﬁned by means of interpretations I = (∆I, ·I ), where the interpretation domain ∆I is a non-empty set, and ·I is a function mapping each concept name A to a subset AI of ∆I and each role name rI to a binary relation rI ⊆ ∆I × ∆I. The function ·I is inductively extended to arbitrary concepts by setting I := ∆I, (C D)I := C I ∩ DI, and (∃r.C)I := {d ∈ ∆I | ∃e ∈ C I : (d, e) ∈ rI }.

A general TBox is a ﬁnite set of axioms, where an axiom can be either a D or a concept equality (CE) C ≡ D, where C, D concept inclusion (CI) C are concepts. An interpretation I satisﬁes a CI C D (written I |= C D) if C I ⊆ DI ; it satisﬁes a CE C ≡ D (written I |= C ≡ D) if C I = DI. I is a model of a general TBox T if it satisﬁes all axioms in T. We write T |= C D (T |= C ≡ D) if every model of T satisﬁes C D (C ≡ D, respectively).

Our main concern in this paper are terminologies, i.e., general TBoxes T

**satisfying the following two conditions:**

– T consists of CEs of the form A ≡ C (concept deﬁnitions) and CIs of the form A C (primitive concept deﬁnitions) only, where A is a concept name;

– no concept name occurs more than once on the left hand side of an axiom in T.

Deﬁne the relation T between concept names by setting A T B if there exists an axiom of the form A ≡ C or A C in T such that B occurs in C. A terminology T is called acyclic if the transitive closure ∗ of T is irreﬂexive.

T A signature Σ is a ﬁnite subset of NC ∪ NR. The signature sig(C) (sig(α), sig(T )) of a concept C (axiom α, terminology T ) is the set of concept and role names which occur in C (α, T, respectively). If sig(C) ⊆ Σ, we also call C a Σ-concept and similarly for axioms and terminologies.

Deﬁnition 1 (Σ-diﬀerence, Σ-entailment). Let T and T be terminologies and Σ a signature. The Σ-diﬀerence, Diﬀ Σ (T, T ), between T and T is deﬁned as Diﬀ Σ (T, T ) = {C D | T |= C D and T |= C D) ⊆ Σ}.

D and sig(C T Σ-entails T if, and only if, Diﬀ Σ (T, T ) = ∅. T and T are called Σinseparable if T and T Σ-entail each other.

** Example 1. Observe that, in some cases, Diﬀ Σ (T, T ) only contains concept implications of at least exponential size, even for acyclic terminologies.**

To start with, let T = ∅,

** T = {A0 B0, A1 ≡ Bn } ∪ {Bi+1 ≡ ∃r.Bi ∃s.Bi | 0 ≤ i n},**

and Σ = {A0, A1, r, s}. Then T is not Σ-entailed by T, and a minimal implication of the form C A1 in Diﬀ Σ (T, T ) is given by Cn A1, where C0 = A0 and Ci+1 = ∃r.Ci ∃s.Ci, for i ≥ 0. Clearly, Cn is of exponential size. Observe, however, that there exist much smaller implications than Cn A1 in Diﬀ Σ (T, T ).

∃r., A1 ∃s., A1 ∃r. ∃s., etc. To avoid this type of Namely, A1 implications in Diﬀ Σ (T, T ) replace T by

Observe that if T Σ-entails T, then T Σ -entails T for any Σ with Σ ∩ sig(T ) ⊆ Σ. This follows immediately from the following interpolation result [16].

** Theorem 1. EL has the interpolation property, i.**

e., if T |= C D, then there exists a ﬁnite set T0 of CIs with sig(T0 ) ⊆ sig(T ) ∩ sig(C D) such that T |= T0 and T0 |= C D.

** Basic properties of EL3**

We derive basic properties of EL from the Gentzen-style sequent calculus of Hofmann [10], see Figure 1.1 The basic calculus of [10] considers EL without the constant and for terminologies without primitive concept deﬁnitions. To take care of, we have added the rule (AxTop), and (PDefL) is the rule representing axioms of the form A C. Cut-elimination, completeness, and correctness can now be proved by a straightforward extension of the proof in [10].

For a terminology T and concepts C, D, we write T C D iﬀ there exists a proof of C D in the calculus of Figure 1.

Theorem 2 (Hofmann). For all terminologies T and concepts C, D, it holds that T |= C D if, and only if, T C D.

We apply this calculus to derive a description of the syntactic form of concepts C such that T |= C D, where D is not equivalent to a conjunction. Call a concept name A primitive in T if A does not occur on the left hand side of an axiom in T. Call A pseudo-primitive in T if it is primitive in T or occurs on the left hand side of primitive concept deﬁnitions in T. In what follows, we say that a concept F is a conjunction of concepts if F = D∈X D, for a set X of concepts. Any D ∈ X is then called a conjunct of F and, if D is a concept name, 1 Alternatively, one could start from the model-theoretic analysis of EL terminologies in [1].

then it is called an atomic conjunct of F. We sometimes write D ∈ F instead of D ∈ X and if X is empty, then F denotes the concept.

** Lemma 1. Let T be a terminology and C = F (r,D)∈Q ∃r.**

D, where F is a conjunction of concept names and Q is a set of pairs (r, D) in which r is a role and D a concept.

1. If T |= C A for an A which is pseudo-primitive in T, then T |= B A, for some atomic conjunct B of F.

2. If T |= C ∃s.C0, then – T |= B ∃s.C0, for some atomic conjunct B of F, or – there exists (r, D) ∈ Q such that r = s and T |= D C0.

Proof. We use Theorem 2 and prove Point 1. Point 2 is proved similarly. Let T C A, where A is pseudo-primitive in T. Let D be a proof of C A. Note that, since A is pseudo-primitive in T, D can only end with one of Ax, AndL1, AndL2, DefL, or PDefL. We show that then T B A, for some conjunct B of F, by induction on the number n of conjuncts in C.

The base case of n = 1 is trivial: D can only end with one of Ax, PDefL, or DefL; so, C is a concept name itself.

** Assume n 1. Then D can only end with one of AndL1 or AndL2.**

In any case, there exists a conjunct C of C such that T C A and C contains less conjuncts than C. By induction, there exists a concept name B which is a conjunct in C such that T B A. Note now that B is also a conjunct of C.

We apply Lemma 1 to show that if T does not Σ-entail T, then there exists C D ∈ Diﬀ Σ (T, T ) such that C or D is a concept name.

Proof. Let C D ∈ Diﬀ Σ (T, T ). Then D = because otherwise T |= C D.

If D = D1 D2, then one of C Di, i = 1, 2, is in the Σ-diﬀerence. If D = ∃r.D1 then, by Lemma 1, either there exists a subconcept A of C, A a concept name, such that A D is in the Σ-diﬀerence, or there exists a subconcept ∃r.C1 of C, such that C1 D1 is in the Σ-diﬀerence. Simplify C D until none of these simpliﬁcation rules is applicable. The resulting CI is as required.

4 Deciding Σ-entailment: theory By Lemma 2, to decide Σ-entailment, it is suﬃcient to decide whether the set Diﬀ Σ (T, T ) contains Σ-implications of the form C A or A D, where A is a concept name. The latter problem is decidable in polynomial time already for general EL-TBoxes [13]. So, in what follows we concentrate on Σ-implications of the form C A. We ﬁrst transform T into a normalised terminology. A concept For A ∈ NC, – if A is pseudo-primitive in T, then

name A is called non-conjunctive in T if it is pseudo-primitive in T or has a deﬁnition of the form A ≡ ∃r.C ∈ T. Otherwise A is called conjunctive in T. A

**terminology T is normalised if it consists of axioms of the following form:**

– A ≡ ∃r.B or A ∃r.B, where B is a concept name;

– A ≡ F or A F, where F is a (possibly empty) conjunction of concept names such that every conjunct B of F is non-conjunctive in T.

Normalised terminologies in the sense deﬁned above are a minor modiﬁcation of normalised terminologies as deﬁned in [1]. Say that two interpretations I and J coincide on a signature Σ, in symbols I|Σ = J |Σ, if ∆I = ∆J and X I = X J for all X ∈ Σ.

** Lemma 3. For every terminology T, one can construct in polynomial time a normalised terminology T of polynomial size in |T | such that sig(T ) ⊆ sig(T ), T |= T, and for every model I of T there exists a model J of T which coincides with I on Σ.**

Moreover, T is acyclic if T is acyclic.

The proof is a straightforward modiﬁcation of the proof in [1]. From now on we will work with normalised terminologies only.

A ∈ Diﬀ Σ (T, T ), we want Intuitively, to decide whether there exists C to construct the most speciﬁc2 Σ-concept CA such that T |= CA A. Then there exists some Σ-concept C such that C A ∈ Diﬀ Σ (T, T ) if, and only if, T |= CA A. Unfortunately, most speciﬁc Σ-concepts with this property do not always exist and, therefore (and also to enable structure sharing), we use an

**additional terminology. We use the following sets and axiom:**

– Σ fresh = {AllΣ } ∪ {ξA | A ∈ NC non-conjunctive in T }, where AllΣ and each ξA are fresh concept names not occurring in Σ ∪ sig(T );

2 Recall that a concept C is more speciﬁc than a concept D if |= C D.

r∈Σ ∃r.( A ∈Σ A – α denotes the concept inclusion AllΣ AllΣ );

– preΣ (A) = {B ∈ Σ | T |= B A}, for A ∈ NC. These sets can be computed T in polynomial time [1].

** Theorem 3. Let T be a normalised terminology and Σ a signature.**