# «Dr.Fill: Crosswords and an Implemented Solver for Singly Weighted CSPs Matthew L. Ginsberg On Time Systems, Inc. 355 Goodpasture Island Road, Suite ...»

Journal of Artiﬁcial Intelligence Research 42 (2011) 851-886 Submitted 07/11; published 12/11

Dr.Fill: Crosswords and an Implemented

Solver for Singly Weighted CSPs

Matthew L. Ginsberg

On Time Systems, Inc.

355 Goodpasture Island Road, Suite 200

Eugene, Oregon 97401

Abstract

We describe Dr.Fill, a program that solves American-style crossword puzzles.

From a technical perspective, Dr.Fill works by converting crosswords to weighted

csps, and then using a variety of novel techniques to ﬁnd a solution. These techniques include generally applicable heuristics for variable and value selection, a variant of limited discrepancy search, and postprocessing and partitioning ideas. Branch and bound is not used, as it was incompatible with postprocessing and was determined experimentally to be of little practical value. Dr.Fill’s performance on crosswords from the American Crossword Puzzle Tournament suggests that it ranks among the top ﬁfty or so crossword solvers in the world.

1. Introduction In recent years, there has been interest in solving constraint-satisfaction problems, or csps, where some of the constraints are “soft” in that while their satisfaction is desirable, it is not strictly required in a solution. As an example, if a construction problem is modeled as a csp, it may be possible to overutilize a particular labor resource by paying the associated workers overtime. While not the cheapest way to construct the artifact in question, the corresponding solution is certainly viable in practice.

Soft constraints can be modeled by assigning a cost to violating any such constraint, and then looking for that solution to the original csp for which the accumulated cost is minimized.

By and large, work on these systems has been primarily theoretical as various tech- niques for solving these “weighted” csps (wcsps) are considered and evaluated without the experimental support of an underlying implementation on a real-world problem. Theoret- ical complexity results have been obtained, and the general consensus appears to be that some sort of branch-and-bound method should be used by the solver, where the cost of one potential solution is used to bound and thereby restrict the subsequent search for possible improvements.

Our goal in this paper is to evaluate possible wcsp algorithms in a more “practical” setting, to wit, the development of a program (Dr.Fill) designed to solve American-style crossword puzzles. Based on the search engine underlying Dr.Fill, our basic conclusions

**are as follows:**

c 2011 AI Access Foundation. All rights reserved.

Ginsberg

1. We present speciﬁc variable- and value-selection heuristics that improve the eﬀective- ness of the search enormously.

2. The most eﬀective search technique appears to be a modiﬁcation of limited discrepancy search (lds) (Harvey & Ginsberg, 1995).

3. Branch-and-bound appears not to be a terribly eﬀective solution technique for at least some problems of this sort.

4. Postprocessing complete candidate solutions improves the eﬀectiveness of the search.

A more complete description of the crossword domain can be found in Section 2.2;

example crosswords appear in Figures 1 and 2. The overall view we will take is that, given both a speciﬁc crossword clue c and possible solution word or “ﬁll” f, there is an associated score p(f |c) that gives the probability that the ﬁll is correct, given the clue. Assuming that these probabilities are independent for diﬀerent clues, the probability that a collection of ﬁlls solves a puzzle correctly is then simply p(fi |ci ) (1) i where fi is the ﬁll entered in response to clue ci. Dr.Fill’s goal is to ﬁnd a set of ﬁlls that is legal (in that intersecting words share a letter at the square of intersection) while maximizing (1).

For human solvers, p(f |c) will in general be zero except for a handful of candidate ﬁlls that conform to full domain knowledge. Thus a “1973 nonﬁction best seller about a woman with multiple personalities” must be “Sybil”; a 3-letter “Black Halloween animal” might be “bat” or “cat”, and so on. For Dr.Fill, complete domain knowledge is impractical and much greater use is made of the crossing words, as the csp solver exploits the hard constraints in the problem to restrict the set of candidate solutions.

Dr.Fill’s performance as a solver is comparable to (but signiﬁcantly faster than) all but the very best human solvers. In solving New York Times crosswords (which increase in diﬃculty from Monday to Saturday, with the large Sunday puzzles comparable to Thursdays in diﬃculty), Dr.Fill generally solves Monday to Wednesday puzzles fairly easily, does well on Friday and Saturday puzzles, but often struggles with Thursday and Sunday puzzles.

These puzzles frequently involve some sort of a “gimmick” where the clues or ﬁll have been modiﬁed in some nonstandard way in order to make the puzzle more challenging. When run on puzzles from the American Crossword Puzzle Tournament, the annual national gathering of top solvers in the New York City area, Dr.Fill’s performance puts it in the top ﬁfty or so of the approximately six hundred solvers who typically attend.

The outline of this paper is as follows. Preliminaries are contained in the next section, both formal preliminaries regarding csps in Section 2.1 and a discussion of crosswords in Section 2.2. Section 2.3 discusses crosswords as csps speciﬁcally, including a description of the variety of ways in which crosswords diﬀer from the problems typically considered by the constraint satisfaction community.

The heuristics used by Dr.Fill are described in Section 3, with value-selection heuristics the topic of Section 3.1 and variable-selection heuristics the topic of Section 3.2. The

** 852 Dr.Fill: A Crossword Solver**

techniques for both value and variable selection can be applied to wcsps generally, although it is not clear how dependent their usefulness is on the crossword-speciﬁc features described in Section 2.3.

Our modiﬁcation of lds is described in Section 4, and is followed in Section 5 with our ﬁrst discussion of the experimental performance of our methods. Algorithmic extensions involving postprocessing are discussed in Section 6, which also discusses the reasons that branch-and-bound techniques are not likely to work well in this domain. Branch-andbound and postprocessing are not compatible but the arguments against branch-and-bound are deeper than that. Section 7 describes the utility of splitting a crossword into smaller problems when the associated constraint graph disconnects, an idea dating back to work of Freuder and Quinn (1985) but somewhat diﬀerent in the setting provided by lds.

Section 8 concludes by describing related and future work, including the earlier crossword solvers Proverb (Littman, Keim, & Shzaeer, 2002) and WebCrow (Ernandes, Angelini, & Gori, 2005), and the Jeopardy-playing program Watson (Ferrucci, Brown, Chu-Carroll, Fan, Gondek, Kalyanpur, Lally, Murdock, Nyberg, Prager, Schlaefer, & Welty, 2010).

2. Preliminaries In this section, we give a brief overview of constraint satisfaction, crossword puzzles, and the relationship between the two.

2.1 Constraint Satisfaction In a conventional constraint-satisfaction problem, or csp, the goal is to assign values to variables while satisfying a set of constraints. The constraints indicate that certain values for one variable, say v1, are inconsistent with other speciﬁc values for a diﬀerent variable v2.

Map coloring is a typical example. If a particular country is colored red, then neighboring countries are not permitted to be the same color.

We will formulate crossword solving by associating a variable to each word in the crossword, with the value of the variable being the associated ﬁll. The fact that the ﬁrst letter of the word at 1-Across has to match the ﬁrst letter of the word at 1-Down corresponds to a constraint between the two variables in question.

A basic csp, then, consists of a set V of variables, a set D of domains, one for each variable, from which the variables’ values are to be taken, and a set of constraints.

Deﬁnition 2.1 Given a set of domains D and set V of variables, an n-ary constraint λ is a pair (T, U ) where T ⊆ V is of size n and U is a subset of the allowed sets of values for the variables in V. An assignment S is a mapping from each variable v ∈ V to an element of v’s domain. For T ⊆ V, the restriction of S to T, to be denoted S|T, is the restriction of the mapping S to the set T. We will say that S satisﬁes the constraint (T, U ) if S|T ∈ U.

The constraint simply speciﬁes the sets of values that are allowed for the various variables involved.

As an example, imagine coloring a map of Europe using the four colors red, green, blue and yellow. Now T might be the set {France, Spain} (which share a border), D would assign the domain {red, green, blue, yellow} to each variable, and U would be the twelve ordered

** 853 Ginsberg**

pairs of distinct colors from the four colors available. The associated constraint indicates that France and Spain cannot be colored the same color.

As we have remarked, we will take the view that the variables in a crossword correspond to the various slots into which words must be entered, and the values to be all of the words in some putative dictionary from which the ﬁlls are taken (but see the comments in Section 2.3). If two words intersect (e.g., 1-Across and 1-Down, typically), there is a binary constraint excluding all pairs of words for which shared letters diﬀer.

Deﬁnition 2.2 A constraint-satisfaction problem, or csp, is a triple (V, D, Λ) where V is a set of variables, D gives a domain for each variable in V, and Λ is a set of constraints.

|V | will be called the size of the csp. If every constraint in Λ is either unary or binary, the csp is called a binary csp.

For a csp C, we will denote the set of variables in C by VC, the domains by DC, and the constraints by ΛC.

A solution to a csp is an assignment that satisﬁes every constraint in Λ.

Both map coloring and crossword solving as described above are binary csps.

There is an extensive literature on csps, describing both their applicability to a wide range of problems and various techniques that are eﬀective in solving them. It is not practical for me to repeat that literature here, but there are two points that are particularly salient.

First, csps are generally solved using some sort of backtracking technique. Values are assigned to variables; when a conﬂict is discovered, a backtrack occurs that is designed to correct the source of the problem and allow the search to proceed. As with other chronologically based backtracking schemes, there are well-known heuristics for selecting the variables to be valued and the values to be used, and the most eﬀective backtracking techniques use some kind of nogood reasoning (Doyle, 1979; Ginsberg, Frank, Halpin, & Torrance, 1990, and many others) to ensure that the backtrack will be able to make progress.

We will need to formalize this slightly.

Deﬁnition 2.3 Let C be a csp, and suppose that v is a variable in VC and x a value in the associated domain in DC. By C|v=x we will denote the csp obtained by setting v to x.

In other words, C|v=x = (VC − v, DC, Λ) where Λ consists of all constraints λ such that for some constraint (T, U ) ∈ ΛC

C|v=x will be called the restriction of C to v = x.

The notation may be intimidating but the idea is simple: Values permitted by constraints in the new problem are just those permitted in the old problem, given that we have decided to set v to x. So if an original constraint doesn’t mention v (the top line in (2)), the new constraint is unchanged. If v is mentioned, we see which values for the other variables are allowed, given that v itself is known to take the value x.

Deﬁnition 2.4 Let S be a partial solution to a csp C, in that S maps some of the variables in VC to elements of DC. The restriction of C to S, to be denoted C|S, is the csp obtained by successively restricting C to each of the assignments in S as in Deﬁnition 2.3.

**This deﬁnition is well-deﬁned because we obviously have:**

Backtracking csp solvers work by selecting variables, trying various values for the variables so selected, and then recursively solving the restricted problems generated by setting the variable in question to the value chosen.

The second general point we would like to make regarding csp solvers is that most implementations use some kind of forward checking to help maintain consistency as the search proceeds. As an example, suppose that we are about to assign a value x to a variable v, but that if we do this, then every possible value for some other variable v will be eliminated by the constraints. In this case, we can obviously eliminate x as a value for v.

It is worth formalizing this a bit, although we do so only in the most general of terms.

Deﬁnition 2.6 A propagation mechanism π is a mapping from csps to csps that does not change the variables, so that π(V, D, Λ) = (V, D, Λ ) for any (V, D, Λ). We also require that for any variable in V, the associated domain in D is a subset of the associated domain in D, and if λ = (T, U ) ∈ Λ, there must be a λ = (T, U ) ∈ Λ with U ⊆ U. We will say that π is sound if, for any csp C and solution S to C, S is also a solution to π(C).

The propagation mechanism strengthens the constraints in the problem, and may reduce some of the variable domains as well. It is sound if it never discards a solution to the original csp.

A wide range of propagation mechanisms has been discussed in the literature. Simplest, of course, is to simply eliminate variable values that can be shown to violate one of the constraints in the problem. Iterating this idea recursively until quiescence (Mackworth,

1977) leads to the well known AC-3 algorithm, which preserves arc consistency as the csp is solved.