FREE ELECTRONIC LIBRARY - Dissertations, online materials

Pages:   || 2 | 3 | 4 | 5 |   ...   | 7 |

«Dr.Fill: Crosswords and an Implemented Solver for Singly Weighted CSPs Matthew L. Ginsberg On Time Systems, Inc. 355 Goodpasture Island Road, Suite ...»

-- [ Page 1 ] --

Journal of Artificial Intelligence Research 42 (2011) 851-886 Submitted 07/11; published 12/11

Dr.Fill: Crosswords and an Implemented

Solver for Singly Weighted CSPs

Matthew L. Ginsberg

On Time Systems, Inc.

355 Goodpasture Island Road, Suite 200

Eugene, Oregon 97401


We describe Dr.Fill, a program that solves American-style crossword puzzles.

From a technical perspective, Dr.Fill works by converting crosswords to weighted

csps, and then using a variety of novel techniques to find a solution. These techniques include generally applicable heuristics for variable and value selection, a variant of limited discrepancy search, and postprocessing and partitioning ideas. Branch and bound is not used, as it was incompatible with postprocessing and was determined experimentally to be of little practical value. Dr.Fill’s performance on crosswords from the American Crossword Puzzle Tournament suggests that it ranks among the top fifty or so crossword solvers in the world.

1. Introduction In recent years, there has been interest in solving constraint-satisfaction problems, or csps, where some of the constraints are “soft” in that while their satisfaction is desirable, it is not strictly required in a solution. As an example, if a construction problem is modeled as a csp, it may be possible to overutilize a particular labor resource by paying the associated workers overtime. While not the cheapest way to construct the artifact in question, the corresponding solution is certainly viable in practice.

Soft constraints can be modeled by assigning a cost to violating any such constraint, and then looking for that solution to the original csp for which the accumulated cost is minimized.

By and large, work on these systems has been primarily theoretical as various tech- niques for solving these “weighted” csps (wcsps) are considered and evaluated without the experimental support of an underlying implementation on a real-world problem. Theoret- ical complexity results have been obtained, and the general consensus appears to be that some sort of branch-and-bound method should be used by the solver, where the cost of one potential solution is used to bound and thereby restrict the subsequent search for possible improvements.

Our goal in this paper is to evaluate possible wcsp algorithms in a more “practical” setting, to wit, the development of a program (Dr.Fill) designed to solve American-style crossword puzzles. Based on the search engine underlying Dr.Fill, our basic conclusions

are as follows:

c 2011 AI Access Foundation. All rights reserved.


1. We present specific variable- and value-selection heuristics that improve the effective- ness of the search enormously.

2. The most effective search technique appears to be a modification of limited discrepancy search (lds) (Harvey & Ginsberg, 1995).

3. Branch-and-bound appears not to be a terribly effective solution technique for at least some problems of this sort.

4. Postprocessing complete candidate solutions improves the effectiveness of the search.

A more complete description of the crossword domain can be found in Section 2.2;

example crosswords appear in Figures 1 and 2. The overall view we will take is that, given both a specific crossword clue c and possible solution word or “fill” f, there is an associated score p(f |c) that gives the probability that the fill is correct, given the clue. Assuming that these probabilities are independent for different clues, the probability that a collection of fills solves a puzzle correctly is then simply p(fi |ci ) (1) i where fi is the fill entered in response to clue ci. Dr.Fill’s goal is to find a set of fills that is legal (in that intersecting words share a letter at the square of intersection) while maximizing (1).

For human solvers, p(f |c) will in general be zero except for a handful of candidate fills that conform to full domain knowledge. Thus a “1973 nonfiction best seller about a woman with multiple personalities” must be “Sybil”; a 3-letter “Black Halloween animal” might be “bat” or “cat”, and so on. For Dr.Fill, complete domain knowledge is impractical and much greater use is made of the crossing words, as the csp solver exploits the hard constraints in the problem to restrict the set of candidate solutions.

Dr.Fill’s performance as a solver is comparable to (but significantly faster than) all but the very best human solvers. In solving New York Times crosswords (which increase in difficulty from Monday to Saturday, with the large Sunday puzzles comparable to Thursdays in difficulty), Dr.Fill generally solves Monday to Wednesday puzzles fairly easily, does well on Friday and Saturday puzzles, but often struggles with Thursday and Sunday puzzles.

These puzzles frequently involve some sort of a “gimmick” where the clues or fill have been modified in some nonstandard way in order to make the puzzle more challenging. When run on puzzles from the American Crossword Puzzle Tournament, the annual national gathering of top solvers in the New York City area, Dr.Fill’s performance puts it in the top fifty or so of the approximately six hundred solvers who typically attend.

The outline of this paper is as follows. Preliminaries are contained in the next section, both formal preliminaries regarding csps in Section 2.1 and a discussion of crosswords in Section 2.2. Section 2.3 discusses crosswords as csps specifically, including a description of the variety of ways in which crosswords differ from the problems typically considered by the constraint satisfaction community.

The heuristics used by Dr.Fill are described in Section 3, with value-selection heuristics the topic of Section 3.1 and variable-selection heuristics the topic of Section 3.2. The

852 Dr.Fill: A Crossword Solver

techniques for both value and variable selection can be applied to wcsps generally, although it is not clear how dependent their usefulness is on the crossword-specific features described in Section 2.3.

Our modification of lds is described in Section 4, and is followed in Section 5 with our first discussion of the experimental performance of our methods. Algorithmic extensions involving postprocessing are discussed in Section 6, which also discusses the reasons that branch-and-bound techniques are not likely to work well in this domain. Branch-andbound and postprocessing are not compatible but the arguments against branch-and-bound are deeper than that. Section 7 describes the utility of splitting a crossword into smaller problems when the associated constraint graph disconnects, an idea dating back to work of Freuder and Quinn (1985) but somewhat different in the setting provided by lds.

Section 8 concludes by describing related and future work, including the earlier crossword solvers Proverb (Littman, Keim, & Shzaeer, 2002) and WebCrow (Ernandes, Angelini, & Gori, 2005), and the Jeopardy-playing program Watson (Ferrucci, Brown, Chu-Carroll, Fan, Gondek, Kalyanpur, Lally, Murdock, Nyberg, Prager, Schlaefer, & Welty, 2010).

2. Preliminaries In this section, we give a brief overview of constraint satisfaction, crossword puzzles, and the relationship between the two.

2.1 Constraint Satisfaction In a conventional constraint-satisfaction problem, or csp, the goal is to assign values to variables while satisfying a set of constraints. The constraints indicate that certain values for one variable, say v1, are inconsistent with other specific values for a different variable v2.

Map coloring is a typical example. If a particular country is colored red, then neighboring countries are not permitted to be the same color.

We will formulate crossword solving by associating a variable to each word in the crossword, with the value of the variable being the associated fill. The fact that the first letter of the word at 1-Across has to match the first letter of the word at 1-Down corresponds to a constraint between the two variables in question.

A basic csp, then, consists of a set V of variables, a set D of domains, one for each variable, from which the variables’ values are to be taken, and a set of constraints.

Definition 2.1 Given a set of domains D and set V of variables, an n-ary constraint λ is a pair (T, U ) where T ⊆ V is of size n and U is a subset of the allowed sets of values for the variables in V. An assignment S is a mapping from each variable v ∈ V to an element of v’s domain. For T ⊆ V, the restriction of S to T, to be denoted S|T, is the restriction of the mapping S to the set T. We will say that S satisfies the constraint (T, U ) if S|T ∈ U.

The constraint simply specifies the sets of values that are allowed for the various variables involved.

As an example, imagine coloring a map of Europe using the four colors red, green, blue and yellow. Now T might be the set {France, Spain} (which share a border), D would assign the domain {red, green, blue, yellow} to each variable, and U would be the twelve ordered

853 Ginsberg

pairs of distinct colors from the four colors available. The associated constraint indicates that France and Spain cannot be colored the same color.

As we have remarked, we will take the view that the variables in a crossword correspond to the various slots into which words must be entered, and the values to be all of the words in some putative dictionary from which the fills are taken (but see the comments in Section 2.3). If two words intersect (e.g., 1-Across and 1-Down, typically), there is a binary constraint excluding all pairs of words for which shared letters differ.

Definition 2.2 A constraint-satisfaction problem, or csp, is a triple (V, D, Λ) where V is a set of variables, D gives a domain for each variable in V, and Λ is a set of constraints.

|V | will be called the size of the csp. If every constraint in Λ is either unary or binary, the csp is called a binary csp.

For a csp C, we will denote the set of variables in C by VC, the domains by DC, and the constraints by ΛC.

A solution to a csp is an assignment that satisfies every constraint in Λ.

Both map coloring and crossword solving as described above are binary csps.

There is an extensive literature on csps, describing both their applicability to a wide range of problems and various techniques that are effective in solving them. It is not practical for me to repeat that literature here, but there are two points that are particularly salient.

First, csps are generally solved using some sort of backtracking technique. Values are assigned to variables; when a conflict is discovered, a backtrack occurs that is designed to correct the source of the problem and allow the search to proceed. As with other chronologically based backtracking schemes, there are well-known heuristics for selecting the variables to be valued and the values to be used, and the most effective backtracking techniques use some kind of nogood reasoning (Doyle, 1979; Ginsberg, Frank, Halpin, & Torrance, 1990, and many others) to ensure that the backtrack will be able to make progress.

We will need to formalize this slightly.

Definition 2.3 Let C be a csp, and suppose that v is a variable in VC and x a value in the associated domain in DC. By C|v=x we will denote the csp obtained by setting v to x.

In other words, C|v=x = (VC − v, DC, Λ) where Λ consists of all constraints λ such that for some constraint (T, U ) ∈ ΛC

–  –  –

C|v=x will be called the restriction of C to v = x.

The notation may be intimidating but the idea is simple: Values permitted by constraints in the new problem are just those permitted in the old problem, given that we have decided to set v to x. So if an original constraint doesn’t mention v (the top line in (2)), the new constraint is unchanged. If v is mentioned, we see which values for the other variables are allowed, given that v itself is known to take the value x.

–  –  –

Definition 2.4 Let S be a partial solution to a csp C, in that S maps some of the variables in VC to elements of DC. The restriction of C to S, to be denoted C|S, is the csp obtained by successively restricting C to each of the assignments in S as in Definition 2.3.

This definition is well-defined because we obviously have:

–  –  –

Backtracking csp solvers work by selecting variables, trying various values for the variables so selected, and then recursively solving the restricted problems generated by setting the variable in question to the value chosen.

The second general point we would like to make regarding csp solvers is that most implementations use some kind of forward checking to help maintain consistency as the search proceeds. As an example, suppose that we are about to assign a value x to a variable v, but that if we do this, then every possible value for some other variable v will be eliminated by the constraints. In this case, we can obviously eliminate x as a value for v.

It is worth formalizing this a bit, although we do so only in the most general of terms.

Definition 2.6 A propagation mechanism π is a mapping from csps to csps that does not change the variables, so that π(V, D, Λ) = (V, D, Λ ) for any (V, D, Λ). We also require that for any variable in V, the associated domain in D is a subset of the associated domain in D, and if λ = (T, U ) ∈ Λ, there must be a λ = (T, U ) ∈ Λ with U ⊆ U. We will say that π is sound if, for any csp C and solution S to C, S is also a solution to π(C).

The propagation mechanism strengthens the constraints in the problem, and may reduce some of the variable domains as well. It is sound if it never discards a solution to the original csp.

A wide range of propagation mechanisms has been discussed in the literature. Simplest, of course, is to simply eliminate variable values that can be shown to violate one of the constraints in the problem. Iterating this idea recursively until quiescence (Mackworth,

1977) leads to the well known AC-3 algorithm, which preserves arc consistency as the csp is solved.

Pages:   || 2 | 3 | 4 | 5 |   ...   | 7 |

Similar works:

«University of Miami Scholarly Repository Open Access Dissertations Electronic Theses and Dissertations 2008-04-27 Solo Techniques for Unaccompanied Pizzicato Jazz Double Bass Larry James Ousley University of Miami, ousley@bellsouth.net Follow this and additional works at: http://scholarlyrepository.miami.edu/oa_dissertations Recommended Citation Ousley, Larry James, Solo Techniques for Unaccompanied Pizzicato Jazz Double Bass (2008). Open Access Dissertations. Paper 96. This Open access is...»

«THOMAS DUNCKERLEY AND ENGLISH FREEMASONRY THOMAS DUNCKERLEY AND ENGLISH FREEMASONRY by Susan Mitchell Sommers PICKERING & CHATTO 2012 Published by Pickering & Chatto (Publishers) Limited 21 Bloomsbury Way, London WC1A 2TH 2252 Ridge Road, Brookfield, Vermont 05036-9704, USA www.pickeringchatto.com All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, or...»

«Dedication This edition is dedicated to my old Dungeons & Dragons group, The Mutants of the Round Table (you know who you are), friends and family, and all those who supported me with encouragement, comments, reviews, and suggestions. Books by D.L. Morrese ~*~ ~Stories of the Warden's World~ An Android Dog’s Tale Defying Fate (Combined eBook Edition) The Warden Threat (Defying Fate Part 1) The Warden War (Defying Fate Part 2) Amy’s Pendant Disturbing Clockwork ~*~ ~Adventures of the Brane...»

«Mission report of a field trip to Iran 23 August 4 September 2010 Dr Stéphane Ostrowski WCS September 2010 Wildlife Conservation Society 2300 Southern Boulevard • Bronx, NY 10460 Cover photos: 1. Participants to a workshop on wildlife chemical immobilization, Parvar Protected 1 2 Area, Semnan Province, 25 August 2010.2. From left to right: Dr. Babak Jourabchian (CACP), Dr. Stephane Ostrowski (WCS), 3 4 Mr. Mohammad Farahdinia (CACP), and Mr. Ali Khani (DoE), discussing the technical aspects...»

«3 Whose Apple Is It Anyway! Copyright © 2014 Linda F. Williams. All rights reserved. No portion of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means — electronic, mechanical, photocopy, recording, scanning, or other — except for brief quotations in critical reviews or articles, or as specifically allowed by the U. S. Copyright Act of 1976, as amended, without the prior written permission of the publisher. Published by Whose Apple Press...»

«Determination of Texture From Individual Grain Orientation Measurements John E. Blendell, Mark D. Vaudin and Edwin R. Fuller, Jr. Ceramics Division Materials Science and Engineering Laboratory National Institute of Standards and Technology Gaithersburg, MD 20899 USA Abstract We present a technique for determining the texture of a polycrystalline material based on the measurement of the orientation of a number of individual grains. We assumed that the sample has fiber (i.e. axisymmetric) texture...»

«Proposal for a Scintillator Tile Hodoscope for PANDA Version 1.1 K. Goetzen, H. Orth, G. Schepers, L. Schmitt, C. Schwarz, A. Wilms Abstract In this document a new detector in place of the barrel time-of-flight detector is proposed. This detector is based on small scintillator tiles read out by silicon photomultipliers. The motivation in terms of physics and technical benefits are summarized. Details of the detector layout are given. 1 CONTENTS 2 Contents 1 Introduction 3 2 Physics Motivation...»

«Behaviour 152 (2015) 335–357 brill.com/beh Non-reciprocal but peaceful fruit sharing in wild bonobos in Wamba Shinya Yamamoto a,b,∗ a Graduate School of Intercultural Studies, Kobe University, 1-2-1 Tsurukabuto, Nada-ku, 657-8501 Kobe, Japan b Wildlife Research Center, Kyoto University, Yoshida-honmachi, Sakyo-ku, 606-8501 Kyoto, Japan * Author’s e-mail address: shinyayamamoto1981@gmail.com Accepted 30 December 2014; published online 29 January 2015 Abstract Food sharing is considered to...»

«Contract reference no.: ALA/95/21-B7-3010/37 Beneficiaries: LALITPUR SUB-METROPOLITAN CITY and KHOKANA VILLAGE DEVELOPMENT COMMITTEE Contractor: City of Chester Periodic Report 15 January 2002 – 14 January 2004 Urban Management and Economic Diversification Project Contents Abbreviations 1. Technical section (a) Project Particulars (b) Summary (c) Working conditions (d) Reporting Tables Table 4 Activity 1 Institutional development Activity 2 Orientation/training Seminar at Chester Activity 3...»

«Trauma Therapists in Israel A Qualitative Study into Personal, Familial and Societal Sources of A Priori Countertransference Yvonne Tauber Trauma Therapists in Israel: A Qualitative Study into Personal, Familial and Societal Sources of A Priori Countertransference ISBN 978-965-91302-0-7 Copyright © 2008, Yvonne Tauber Printed by Gildeprint, Enschede All rights reserved. No part of this publication may be reproduced in any form by any electronic or mechanical means (including photocopying,...»

«Fast Track Trade Authority Must Be Replaced To Deliver Trade Agreement that Can Deliver Broad Benefits March 23, 2015 Dear Senator: Last fall, our organizations were joined by nearly 600 other unions and environmental, consumer, faith, family farm, civil rights, seniors, LGBT and other civil society organizations on a letter outlining the features of a trade authority mechanism that we would support. As negotiations on a prospective trade authority bill continue, we wanted to bring these...»

«Chapter 3 Glaucoma The Eyes Have It by Tim Root Is that one Due to budget cuts in mine? the ophtho department, all our white coats will No, be marked with sorry. corporate logos. I’m afraid Nike, Would you prefer Reebok, and Adidas Viagra, Valtrex, have already been or Monistat-7? claimed by the senior doctors. Crud! What’s left? Introduction to Glaucoma by Tim Root, M.D. Glaucoma is a disease where the optic nerve dies. We are not sure why or how this happens (there are many mechanical,...»

<<  HOME   |    CONTACTS
2016 www.dissertation.xlibx.info - Dissertations, online materials

Materials of this site are available for review, all rights belong to their respective owners.
If you do not agree with the fact that your material is placed on this site, please, email us, we will within 1-2 business days delete him.