# «Chapter 4 Conﬂict-Driven Clause Learning SAT Solvers Joao Marques-Silva, Ines Lynce and Sharad Malik 4.1. Introduction One of the main reasons for ...»

Handbook of Satisﬁability 131

Armin Biere, Marijn Heule, Hans van Maaren and Toby Walsh (Eds.)

IOS Press, 2009

c 2009 Joao Marques-Silva, Ines Lynce and Sharad Malik and IOS Press. All

rights reserved.

doi:10.3233/978-1-58603-929-5-131

Chapter 4

Conﬂict-Driven Clause Learning

SAT Solvers

Joao Marques-Silva, Ines Lynce and Sharad Malik

4.1. Introduction

One of the main reasons for the widespread use of SAT in many applications is that Conﬂict-Driven Clause Learning (CDCL) Boolean Satisﬁability (SAT) solvers are so eﬀective in practice. Since their inception in the mid-90s, CDCL SAT solvers have been applied, in many cases with remarkable success, to a number of practi- cal applications. Examples of applications include hardware and software model checking, planning, equivalence checking, bioinformatics, hardware and software test pattern generation, software package dependencies, and cryptography. This chapter surveys the organization of CDCL solvers, from the original solvers that inspired modern CDCL SAT solvers, to the most recent and proven techniques.

The organization of CDCL SAT solvers is primarily inspired by DPLL solvers.

As a result, and even though the chapter is self-contained, a reasonable knowledge of the organization of DPLL is assumed. In order to oﬀer a detailed account of CDCL SAT solvers, a number of concepts have to be introduced, which serve to formalize the operations implemented by any DPLL SAT solver.

DPLL corresponds to backtrack search, where at each step a variable and a propositional value are selected for branching purposes. With each branching step, two values can be assigned to a variable, either 0 or 1. Branching cor- responds to assigning the chosen value to the chosen variable. Afterwards, the logical consequences of each branching step are evaluated. Each time an unsat- isﬁed clause (i.e. a conﬂict) is identiﬁed, backtracking is executed. Backtracking corresponds to undoing branching steps until an unﬂipped branch is reached.

When both values have been assigned to the selected variable at a branching step, backtracking will undo this branching step. If for the ﬁrst branching step both values have been considered, and backtracking undoes this ﬁrst branching step, then the CNF formula can be declared unsatisﬁable. This kind of back- tracking is called chronological backtracking. An alternative backtracking scheme is non-chronological backtracking, which is described later in this chapter. A more detailed description of the DPLL algorithm is given in Part 1, Chapter 3.

132 Chapter 4. CDCL Solvers Besides using DPLL, building a state-of-the-art CDCL SAT solver involves a

**number of additional key techniques:**

• Learning new clauses from conﬂicts during backtrack search [MSS96].

• Exploiting structure of conﬂicts during clause learning [MSS96].

• Using lazy data structures for the representation of formulas [MMZ+ 01].

• Branching heuristics must have low computational overhead, and must receive feedback from backtrack search [MMZ+ 01].

• Periodically restarting backtrack search [GSK98].

• Additional techniques include deletion policies for learnt clauses [GN02], the actual implementation of lazy data structures [Rya04], the organization of unit propagation [LSB05], among others.

The chapter is organized as follows. The next section introduces the notation used throughout the chapter. Afterwards, Section 4.3 summarizes the organization of modern CDCL SAT solvers. Section 4.4 details conﬂict analysis, the procedure used for learning new clauses. Section 4.5 outlines more recent techniques that have been shown to be eﬀective in practice. The chapter concludes in Section 4.6 by providing a historical perspective of the work on CDCL SAT solvers.

**4.2. Notation**

In this chapter, propositional formulas are represented in Conjunctive Normal Form (CNF). A ﬁnite set of Boolean variables is assumed X = {x1, x2, x3,..., xn }.

A CNF formula ϕ consists of a conjunction of clauses ω, each of which consists of a disjunction of literals. A literal is either a variable xi or its complement ¬xi.

A CNF formula can also be viewed as a set of clauses, and each clause can be viewed as a set of literals. Throughout this chapter, the representation used will be clear from the context.

**Example 4.2.1 (CNF Formula). An example of a CNF formula is:**

In the context of search algorithms for SAT, variables can be assigned a logic value, either 0 or 1. Alternatively, variables may also be unassigned. Assignments to the problem variables can be deﬁned as a function ν : X → {0, u, 1}, where u denotes an undeﬁned value used when a variable has not been assigned a value in {0, 1}. Given an assignment ν, if all variables are assigned a value in {0, 1}, then ν is referred to as a complete assignment. Otherwise it is a partial assignment.

Assignments serve for computing the values of literals, clauses and the complete CNF formula, respectively, lν, ω ν and ϕν. A total order is deﬁned on the possible assignments, 0 u 1. Moreover, 1 − u = u. As a result, the following Chapter 4. CDCL Solvers 133

**deﬁnitions apply:**

The assignment function ν will also be viewed as a set of tuples (xi, vi ), with vi ∈ {0, 1}. Adding a tuple (xi, vi ) to ν corresponds to assigning vi to xi, such that ν(xi ) = vi. Removing a tuple (xi, vi ) from ν, with ν(xi ) = u, corresponds to assigning u to xi.

Clauses are characterized as unsatisﬁed, satisﬁed, unit or unresolved. A clause is unsatisﬁed if all its literals are assigned value 0. A clause is satisﬁed if at least one of its literals is assigned value 1. A clause is unit if all literals but one are assigned value 0, and the remaining literal is unassigned. Finally, a clause is unresolved if it is neither unsatisﬁed, nor satisﬁed, nor unit.

A key procedure in SAT solvers is the unit clause rule [DP60]: if a clause is unit, then its sole unassigned literal must be assigned value 1 for the clause to be satisﬁed. The iterated application of the unit clause rule is referred to as unit propagation or Boolean constraint propagation (BCP) [ZM88]. In modern CDCL solvers, as in most implementations of DPLL, logical consequences are derived with unit propagation. Unit propagation is applied after each branching step (and also during preprocessing), and is used for identifying variables which must be assigned a speciﬁc Boolean value. If an unsatisﬁed clause is identiﬁed, a conﬂict condition is declared, and the algorithm backtracks.

In CDCL SAT solvers, each variable xi is characterized by a number of properties, including the value, the antecedent and the decision level, denoted respectively by ν(vi ) ∈ {0, u, 1}, α(xi ) ∈ ϕ ∪ {NIL}, and δ(xi ) ∈ {−1, 0, 1,..., |X|}. A variable xi that is assigned a value as the result of applying the unit clause rule is said to be implied. The unit clause ω used for implying variable xi is said to be the antecedent of xi, α(xi ) = ω. For variables that are decision variables or are unassigned, the antecedent is NIL. Hence, antecedents are only deﬁned for variables whose value is implied by other assignments. The decision level of a variable xi denotes the depth of the decision tree at which the variable is assigned a value in {0, 1}. The decision level for an unassigned variable xi is −1, δ(xi ) = −1. The decision level associated with variables used for branching steps (i.e. decision assignments) is speciﬁed by the search process, and denotes the current depth of the decision stack. Hence, a variable xi associated with a decision assignment is characterized by having α(xi ) = NIL and δ(xi ) 0. More formally, the decision

**level of xi with antecedent ω is given by:**

** δ(xi ) = max({0} ∪ {δ(xj ) | xj ∈ ω ∧ xj = xi }) (4.6)**

i.e. the decision level of an implied literal is either the highest decision level of the implied literals in a unit clause, or it is 0 in case the clause is unit. The notation xi = v @ d is used to denote that ν(xi ) = v and δ(xi ) = d. Moreover, the decision level of a literal is deﬁned as the decision level of its variable, δ(l) = δ(xi ) if l = xi or l = ¬xi.

## 134 Chapter 4. CDCL Solvers

Assume decision assignments x21 = 0@2 and x31 = 0@3. Moreover, assume the current decision assignment x1 = 0@5. The resulting implication graph is shown in ﬁgure 4.2, and yields a conﬂict because clause (x5 ∨ x6 ) becomes unsatisﬁed.

4.3. Organization of CDCL Solvers Algorithm 4.1 shows the standard organization of a CDCL SAT solver, which essentially follows the organization of DPLL. With respect to DPLL, the main

## 136 Chapter 4. CDCL Solvers

diﬀerences are the call to function ConflictAnalysis each time a conﬂict is identiﬁed, and the call to Backtrack when backtracking takes place. Moreover, the Backtrack procedure allows for backtracking non-chronologically.

In addition to the main CDCL function, the following auxiliary functions are

**used:**

• UnitPropagation consists of the iterated application of the unit clause rule. If an unsatisﬁed clause is identiﬁed, then a conﬂict indication is returned.

• PickBranchingVariable consists of selecting a variable to assign and the respective value.

• ConflictAnalysis consists of analyzing the most recent conﬂict and learning a new clause from the conﬂict. The organization of this procedure is described in section 4.4.

• Backtrack backtracks to the decision level computed by ConflictAnalysis.

• AllVariablesAssigned tests whether all variables have been assigned, in which case the algorithm terminates indicating that the CNF formula is satisﬁable. An alternative criterion to stop execution of the algorithm is to check whether all clauses are satisﬁed. However, in modern SAT solvers that use lazy data structures, clause state cannot be maintained accurately, and so the termination criterion must be whether all variables are assigned.

Arguments to the auxiliary functions are assumed to be passed by reference.

Hence, ϕ and ν are supposed to be modiﬁed during execution of the auxiliary functions.

The typical CDCL algorithm shown does not account for a few often used techniques, namely search restarts [GSK98, BMS00] and implementation of clause deletion policies [GN02]. Search restarts cause the algorithm to restart itself, but already learnt clauses are kept. Clause deletion policies are used to decide learnt clauses that can be deleted. Clause deletion allows the memory usage of the SAT solver to be kept under control.

## Chapter 4. CDCL Solvers 137

4.4. Conﬂict Analysis This section outlines the conﬂict analysis procedure used by modern SAT solvers.

4.4.1. Learning Clauses from Conﬂicts Each time the CDCL SAT solver identiﬁes a conﬂict due to unit propagation, the ConflictAnalysis procedure is invoked. As a result, one or more new clauses are learnt, and a backtracking decision level is computed. The conﬂict analysis procedure analyzes the structure of unit propagation and decides which literals to include in the learnt clause.

The decision levels associated with assigned variables deﬁne a partial order of the variables. Starting from a given unsatisﬁed clause (represented in the implication graph with vertex κ), the conﬂict analysis procedure visits variables implied at the most recent decision level (i.e. the current largest decision level), identiﬁes the antecedents of visited variables, and keeps from the antecedents the literals assigned at decision levels less than the most recent decision level. This process is repeated until the most recent decision variable is visited.

Let d be the current decision level, let xi be the decision variable, let ν(xi ) = v be the decision assignment, and let ωj be an unsatisﬁed clause identiﬁed with unit propagation. In terms of the implication graph, the conﬂict vertex κ is such that α(κ) = ωj. Moreover, let represent the resolution operator. For two clauses ωj and ωk, for which there is a unique variable x such that one clause has a literal x and the other has literal ¬x, ωj ωk contains all the literals of ωj and ωk with the exception of x and ¬x.

The clause learning procedure used in SAT solvers can be deﬁned by a sequence of selective resolution operations [MSS00, BKS04], that at each step yields a new temporary clause. First, deﬁne a predicate that holds if a clause ω has an

**implied literal l assigned at the current decision level d:**

Equation (4.16) can be used for formalizing the clause learning procedure.

The ﬁrst condition, i = 0, denotes the initialization step given κ in I, where all literals in the unsatisﬁed clause are added to the ﬁrst intermediate clause clause.

Afterwards, at each step i, a literal l assigned at the current decision level d is d,i−1 selected and the intermediate clause (i.e. ωL ) is resolved with the antecedent of l.

## 138 Chapter 4. CDCL Solvers

d,i d,i−1 For an iteration i such that ωL = ωL, then a ﬁxed point is reached, and d,i ωL ωL represents the new learnt clause. Observe that the number of resolution operations represented by (4.16) is no greater than |X|.

Modern SAT solvers implement an additional reﬁnement of equation (4.16), by further exploiting the structure of implied assignments. This is discussed in sub-section 4.4.3.

** Example 4.4.**

1 (Clause Learning). Consider example 4.2.4. Applying clause learning to this example, results in the intermediate clauses shown in table 4.1.