Re: Searchable chess positions in a Postgress DB

From: Gavin Flower <GavinFlower(at)archidevsys(dot)co(dot)nz>
To: Sidney Cadot <sidney(at)jigsaw(dot)nl>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Searchable chess positions in a Postgress DB
Date: 2012-04-12 21:50:37
Message-ID: 4F874E2D.4030709@archidevsys.co.nz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On 11/04/12 21:24, Gavin Flower wrote:
> On 11/04/12 19:15, Sidney Cadot wrote:
>> Dear all,
>>
>> As a hobby project, I am toying around with a database containing
>> about 5 million chess games. On average, these games have about 80
>> positions (~ 40 moves by both black and white), which means there are
>> about 400 million chess positions in there.
>>
>> I have written code to extract these positions, and now I want to put
>> them into a Postgres database. Specifically, I want to do this in a
>> way that allows *fast* lookups of positions, e.g. "give me all
>> positions that have a White King on c4 and either a Black Bishop or
>> White Knight on f7".
>>
>> Currently, my "Positions" table looks like this:
>>
>> Column | Type | Modifiers
>> -------------------+---------+-----------
>> gameindex | integer | not null
>> plyindex | integer | not null
>> pseudofenboard | text | not null
>> fenside | text | not null
>> fencastling | text | not null
>> fenenpassant | text | not null
>> possiblemovecount | integer | not null
>> isincheck | boolean | not null
>> Indexes:
>> "positions_pkey" PRIMARY KEY, btree (gameindex, plyindex)
>> Foreign-key constraints:
>> "positions_gameindex_fkey" FOREIGN KEY (gameindex) REFERENCES
>> games(gameindex)
>>
>> The "PseudoFenBoard" field currently holds a string describing the
>> position. For example, the starting position of chess looks like this:
>>
>> "rnbqkbnr/pppppppp/________/________/________/________/PPPPPPPP/RNBQKBNR"
>>
>> This design allows me to formulate the kind of positional queries that
>> I want (by using regular expression matching), but executing them will
>> involve a slow, linear traversal of the 400M table rows, which is not
>> desirable.
>>
>> I am toying around with the ugly idea to make a "Positions" table that
>> has a single field for each of the squares, e.g.
>>
>> CREATE TABLE Position2 (
>> GameIndex INTEGER NOT NULL,
>> PlyIndex INTEGER NOT NULL,
>> a1 "char" NOT NULL,
>> a2 "char" NOT NULL,
>> -- (60 fields defs omitted)
>> h7 "char" NOT NULL,
>> h8 "char" NOT NULL
>> );
>>
>> This would allow the creation of indices on each of the 64 fields
>> separately, which should help to achieve near-instantaneous position
>> query performance, especially after gathering proper statistics for
>> all the field-specific indices.
>>
>> I realize that this design is quite ugly, so I would be interested to
>> hear if there are nicer alternatives that can perform equally well.
>>
>> Also, above I use the 1-byte "char" type. Is this the only type in
>> PostGres that is guaranteed to be just a single byte, or are there
>> better alternatives? A 13-state enum would be best (listing the 6
>> white pieces, 6 black pieces, and 'empty' states for every square on
>> the board) but as I understand from the documentation, enums always up
>> take 4 bytes per entry.
>>
>> Any ideas for improvement would be greatly appreciated.
>>
>
> How aboutsomething like the following (game and postion would have
> more fields in practice, like comments and where played)?
>
>
> DROP TABLE IF EXISTS game CASCADE;
>
> CREATE TABLE game
> (
> id int PRIMARY KEY,
> name_white text,
> name_black text,
> played timestamptz
> );
>
> CREATE TABLE position
> (
> id int PRIMARY KEY,
> game_id int REFERENCES game (id),
> ply int
> );
>
> CREATE TABLE piece
> (
> id int PRIMARY KEY,
> position_id int REFERENCES position (id),
> rank char, -- 1...8 from white's perspective
> file char, -- a...h
> white boolean,
> type char -- P.R,N,B,K,Q
> );
>
> CREATE UNIQUE INDEX square ON piece (rank, file, type, white);
>
>
> SELECT
> p.position_id
> FROM
> piece p
> WHERE
> ( p.white
> AND p.type = 'K'
> AND p.file = 'c'
> AND p.rank = '4'
> )
> AND
> (
> ((NOT p.white AND p.type = 'B') OR (p.white AND p.type = 'K'))
> AND p.file = 'f'
> AND p.rank = '7'
> );
>
>
> Cheers,
> Gavin
>
There was a blatantly obvious flaw in the above query: the pices
checked, should belong to the same position!

That I only discovered the flaw when I mentally checked the SQL on the
way to work the folowing day.

The following, hopefully, fixes the problem

SELECT
p1.position_id
FROM
piece AS p1 JOIN piece AS p2 USING (position_id)
WHERE
(
p1.white
AND p1.type = 'K'
AND p1.file = 'c'
AND p1.rank = '4'
)
AND
(
((NOT p2.white AND p2.type = 'B') OR (p2.white AND p2.type = 'K'))
AND p2.file = 'f'
AND p2.rank = '7'
);

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Michael Nolan 2012-04-13 00:07:33 Re: Searchable chess positions in a Postgress DB
Previous Message Bret Stern 2012-04-12 21:04:49 Installer Questions (NSIS)