Skip site navigation (1) Skip section navigation (2)

Re: Optimize date query for large child tables: GiST or GIN?

From: Yeb Havinga <yebhavinga(at)gmail(dot)com>
To: David Jarvis <thangalin(at)gmail(dot)com>
Cc: Stephen Frost <sfrost(at)snowman(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Thom Brown <thombrown(at)gmail(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Optimize date query for large child tables: GiST or GIN?
Date: 2010-05-21 18:49:36
Message-ID: 4BF6D5C0.2030902@gmail.com (view raw or flat)
Thread:
Lists: pgsql-performance
David Jarvis wrote:
> Hi, Yeb.
>
> This is starting to go back to the design I used with MySQL:
>
>     * YEAR_REF - Has year and station
>     * MONTH_REF - Has month, category, and yea referencer
>     * MEASUREMENT - Has month reference, amount, and day
>
> Normalizing by date parts was fast. Partitioning the tables by year 
> won't do much good -- users will probably choose 1900 to 2009, 
> predominately.
Ok, in that case it is a bad idea.
> I thought about splitting the data by station by category, but that's 
> ~73000 tables. My understanding is that PostgreSQL uses files per 
> index, which would be messy at the OS level (Linux 2.6.31). Even by 
> station alone is 12139 tables, which might be tolerable for now, but 
> with an order of magnitude more stations on the distant horizon, it 
> will not scale.
Yes, I've read a few times now that PG's partitioning doesn't scale 
beyond a few 100 partitions.
> I also thought about splitting the data by station district by 
> category -- there are 79 districts, yielding 474 child tables, which 
> is ~575000 rows per child table. Most of the time I'd imagine only one 
> or two districts would be selected. (Again, hard to know exactly.)
I agee with Matthew Wakeling in a different post: its probably wise to 
first see how fast things can get by using indexes. Only if that fails 
to be fast, partitioning might be an option. (Though sequentially 
scanning 0.5M rows is not cheap).

I experimented a bit with a doy and year function.

-- note: leap year fix must still be added
create or replace function doy(timestamptz) RETURNS float8
as 'select extract(doy from $1);'
language sql
immutable
strict;
create or replace function year(timestamptz) RETURNS float8
as 'select extract(year from $1);'
language sql
immutable
strict;

\d parent
             Table "public.parent"
 Column |           Type           | Modifiers
--------+--------------------------+-----------
 t      | timestamp with time zone |
 y      | smallint                 |
Indexes:
    "doy_i" btree (doy(t))
    "year_i" btree (year(t))

A plan like the following is probably what you want

test=# explain select * from parent where doy(t) between 10 and 20 and 
year(t) between 1900 and 2009;
                                                                               
QUERY 
PLAN                                                                                

-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on parent  (cost=9.95..14.97 rows=1 width=10)
   Recheck Cond: ((year(t) >= 1900::double precision) AND (year(t) <= 
2009::double precision) AND (doy(t) >= 10::double precision) AND (doy(t) 
<= 20::double precision))
   ->  BitmapAnd  (cost=9.95..9.95 rows=1 width=0)
         ->  Bitmap Index Scan on year_i  (cost=0.00..4.85 rows=10 width=0)
               Index Cond: ((year(t) >= 1900::double precision) AND 
(year(t) <= 2009::double precision))
         ->  Bitmap Index Scan on doy_i  (cost=0.00..4.85 rows=10 width=0)
               Index Cond: ((doy(t) >= 10::double precision) AND (doy(t) 
<= 20::double precision))
(7 rows)

regards,
Yeb Havinga




In response to

Responses

pgsql-performance by date

Next:From: Matthew WakelingDate: 2010-05-21 19:12:12
Subject: Re: Optimize date query for large child tables: GiST or GIN?
Previous:From: Yeb HavingaDate: 2010-05-21 18:21:21
Subject: Re: Optimize date query for large child tables: GiST or GIN?

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group