Using indexes for partial index builds

From: Paul Norman <penorman(at)mac(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Using indexes for partial index builds
Date: 2013-02-02 10:05:40
Message-ID: 048801ce012c$dc68d650$953a82f0$@mac.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello,
After a discussion on IRC in #postgresql, I had a feature suggestion and it
was suggested I write it up here.

I have a large (200GB, 1.7b rows) table with a number of columns, but the
two of interest here are a hstore column, tags and a postgis geometry
column, geom.

There is a GIN index on tags and a gist index on geom. These took about
36-48 hours to build in total. Obviously index building on a table this size
is not trivial.

Periodically I want to do a number of specialized queries on objects with a
particular tag or in a particular area. To do this I often want to create a
partial index. For example, I created the index btree ((tags ->
'name_1'::text) text_pattern_ops) WHERE tags ? 'name_1'::text. My
understanding is to create this index PostgreSQL does a scan of the entire
table, even though the GIN index on tags could be used to identify which
rows could belong in the index. Where the WHERE condition selects only a
small portion of the table this is scanning a lot more data than is
necessary.

Another case where it would be useful is when I am conducting a detailed
analysis of some aspect of the rows in a particular city. This leads to all
the queries being of the form SELECT ... FROM ... WHERE
is_in_my_area(geom)[1].

My current project is doing analysis involving addresses. The ability to
create an index like btree((tags -> 'addr:housenumber'), (tags ->
'addr:street'), (tags -> 'addr:city')) WHERE is_in_my_area(geom) in a
reasonable time would allow me to use a view instead of copying the local
area to a temporary table and indexing that table. The local area is about
350k rows, or about 0.02% of the database.

[1] The actual function for determining if it's in my area is long and not
really essential to the point here.

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavel Stehule 2013-02-02 10:30:12 Re: proposal - assign result of query to psql variable
Previous Message Pavel Stehule 2013-02-02 09:23:21 proposal 9.4 plpgsql: allows access to call stack from GET DIAGNOSTICS statement