queries on xmin

From: Matt Amos <zerebubuth(at)gmail(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: queries on xmin
Date: 2009-06-11 10:25:49
Message-ID: 79d9e4e90906110325h619c493rb57e5370bd5f2f88@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

the openstreetmap project (http://osm.org/) recently moved from using
mysql to postgres and we're trying to improve some of our tools using
the new functionality that postgres provides.

in particular, we are dumping changes to the database at short
intervals (currently every minute, hour and day [1,2]) so that 3rd
party sites can use this to keep up-to-date with the main database. it
previously worked by examining the timestamp of each modified element,
but this is no longer practical due to new features in the
openstreetmap API which can cause long-running transactions [3].

we've been working out a scheme based on taking txid_snapshots at
short intervals and dumping the new rows (due to the way it's
implemented, all edits are inserted rows) and querying xmin. the query
looks something like this:

select id,version from (nodes|ways|relations) where timestamp > (now()
- '1 hour'::interval) and xmin in (...)

and we build up the txid list from the two snapshots we're dumping
between on the client. however, we're finding that this becomes much
less efficient as the txid list becomes longer. in an effort to reduce
the query time we're looking to index the xmin column. it seems that
hash indexes are already supported on the txid type, but btree are not
[4].

the queries we're doing would usually be of the form "xmin in
previous_unfinished_txids or (xmin > previous_max_txid and xmin <=
current_max_txid and not in current_unfinished_txids)" except when
wrap-around occurs, so it would seem that a btree index would be
superior to building this list client-side and using a hash index.

what problems are we going to create for ourselves if we create a
btree index on xmin casted to int4? would it be as efficient to use a
hash index, create a temporary table of txids that we're querying with
a hash index and do an explicit join? have i missed the point
entirely?

many thanks,

matt

[1] http://wiki.openstreetmap.org/wiki/Planet.osm/diffs
[2] http://wiki.openstreetmap.org/wiki/OsmChange
[3] http://wiki.openstreetmap.org/wiki/OSM_Protocol_Version_0.6#Diff_upload:_POST_.2Fapi.2F0.6.2Fchangeset.2F.23id.2Fupload
[4] http://archives.postgresql.org/pgsql-general/2004-10/msg01474.php

Responses

Browse pgsql-general by date

  From Date Subject
Next Message David 2009-06-11 10:38:49 Re: When to use cascading deletes?
Previous Message hubert depesz lubaczewski 2009-06-11 10:24:41 Re: Different handling of PL/pgSQL for-loop variables in 8.3.7 vs. 8.2.7 ???