contsel and gist

From: Ben <midfield(at)gmail(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: contsel and gist
Date: 2010-10-28 16:57:49
Message-ID: 42FF469A-0BB5-4CE4-90A0-D9A7EF3C0BB3@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

hello --

i have a largish table (~8 billion rows) which makes use of the temporal period datatype and gist indexes. i find that query plans are somewhat "unstable" in that it is often the case that slightly altering a query can result in a change from using the index (usually via a bitmap scan) to a sequential scan. there is basically no situation where a sequential scan is the right plan for the kinds of queries we are doing -- the table is so large that any sequential scan would take hours. (i will give more details about our setup below.)

my guess is that it has to do with the selectivity of the @> operator. i've looked and noticed that the selectivity functions for @> and other period operators are basically stubs, with constant selectivity. my questions are :

1 - am i wrong in my assessment? is the constant contsel, areasel, etc hurting us?

2 - how hard would it be to implement contsel et al for period data types? i've read the gist papers but find the eqsel code a bit hard to understand. (would anyone be willing to help?)

more details :

pg 9, linux x64_64 box with 24gb ram and software raid-5. (not ideal, i understand.) the table is

create table timeseries ( id integer not null, value float not null, timespan period not null );
create index timeseries_id on timeseries (id);
create index timeseries_timespan on timeseries using gist (timespan);

in our dataset there are about 2000 different time series, each given a different id. the time series are piecewise constant functions we are representing as (id, value, time interval) triples. the intervals are typically no more than a few seconds, at most a few minutes. for each id, the intervals are non-overlapping and cover the total time period. there are about 8 billion rows of historical data, and there are about 3 million new rows a day, relatively evenly spread across the different ids. the database gets updated once a day with the new rows, and the rows are loaded in time order; the historical data is basically ordered in time. other than that single daily update, the workload is basically read-only. the most important query for us is to sample the time series at (potentially irregular) grid points (i will give some example queries below.)

there are some small side tables (less than 150K rows) for things like different grid points to sample at, or auxiliary data which augment the time series.

create table grids ( gridid integer not null, gridpt timestamp with timezone, primary key (gridid, gridpt) );
create table adjs1 ( id integer not null, timespan period not null, adj double precision not null);
create index adjs1_timespan on adjs1 using gist (timespan);
create index adjs1_id on adjs1 (id);

an example query that works :

postgres=> explain analyze select * from timeseries join grids on timespan @> gridpt where gridid = 2 and gridpt between '2006-10-12' and '2006-10-15';
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=18082.74..33760441.77 rows=6525038912 width=48) (actual time=204.655..2498.152 rows=34275 loops=1)
-> Index Scan using grids_pkey on grids (cost=0.00..76.64 rows=23 width=12) (actual time=0.059..0.559 rows=38 loops=1)
Index Cond: ((gridid = 2) AND (gridpt >= '2006-10-12 00:00:00'::timestamp without time zone) AND (gridpt <= '2006-10-15 00:00:00'::timestamp without time zone))
-> Bitmap Heap Scan on timeseries (cost=18082.74..1460749.52 rows=567395 width=36) (actual time=32.561..62.545 rows=902 loops=38)
Recheck Cond: (timeseries.timespan @> grids.gridpt)
-> Bitmap Index Scan on timeseries_idx_timespan (cost=0.00..17940.89 rows=567395 width=0) (actual time=32.184..32.184 rows=902 loops=38)
Index Cond: (timeseries.timespan @> grids.gridpt)
Total runtime: 2553.386 ms
(8 rows)

Time: 2555.498 ms

where there are about 10-100 gridpts between the times selected. this query plan looks good to me, and indeed it runs fairly fast.

an example query that goes haywire :

postgres=> explain select * from timeseries as TS join grids on TS.timespan @> gridpt join adjs1 as AD1 on TS.id=AD1.id and AD1.timespan @> gridpt where gridid=2 and gridpt between '2006-10-19' and '2006-10-22';
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Hash Join (cost=7166.37..2517194919.79 rows=83476436469 width=76)
Hash Cond: (ts.id = ad1.id)
Join Filter: (ts.timespan @> grids.gridpt)
-> Seq Scan on timeseries ts (cost=0.00..10402235.88 rows=567394688 width=36)
-> Hash (cost=2024.57..2024.57 rows=411344 width=40)
-> Nested Loop (cost=4.36..2024.57 rows=411344 width=40)
-> Index Scan using grids_pkey on grids (cost=0.00..76.64 rows=23 width=12)
Index Cond: ((gridid = 2) AND (gridpt >= '2006-10-19 00:00:00'::timestamp without time zone) AND (gridpt <= '2006-10-22 00:00:00'::timestamp without time zone))
-> Bitmap Heap Scan on adjs1 ad1 (cost=4.36..84.24 rows=36 width=28)
Recheck Cond: (ad1.timespan @> grids.gridpt)
-> Bitmap Index Scan on adjs1_timespan (cost=0.00..4.35 rows=36 width=0)
Index Cond: (ad1.timespan @> grids.gridpt)

i've omitted the explain analyze because it is too slow.

i can circumvent this sequential scan by some outer join trickery, but that seems like a hack, and although relatively quick doesn't seem like an idea plan:

postgres=> explain analyze select * from timeseries as TS join grids on TS.timespan @> gridpt left outer join adjs1 as AD1 on TS.id=AD1.id and AD1.timespan @> gridpt where gridid=2 and gridpt between '2006-10-19' and '2006-10-22';
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Hash Left Join (cost=1068.80..2563197208.41 rows=83476436469 width=76) (actual time=145.504..2542.112 rows=34125 loops=1)
Hash Cond: (ts.id = ad1.id)
Join Filter: (ad1.timespan @> grids.gridpt)
-> Nested Loop (cost=0.00..44001399.98 rows=6525038912 width=48) (actual time=7.079..1634.089 rows=34125 loops=1)
-> Index Scan using grids_pkey on grids (cost=0.00..76.64 rows=23 width=12) (actual time=0.040..0.426 rows=38 loops=1)
Index Cond: ((gridid = 2) AND (gridpt >= '2006-10-19 00:00:00'::timestamp without time zone) AND (gridpt <= '2006-10-22 00:00:00'::timestamp without time zone))
-> Index Scan using timeseries_idx_timespan on timeseries ts (cost=0.00..1906008.58 rows=567395 width=36) (actual time=3.695..39.859 rows=898 loops=38)
Index Cond: (ts.timespan @> grids.gridpt)
-> Hash (cost=621.69..621.69 rows=35769 width=28) (actual time=138.374..138.374 rows=35769 loops=1)
Buckets: 4096 Batches: 1 Memory Usage: 2236kB
-> Seq Scan on adjs1 ad1 (cost=0.00..621.69 rows=35769 width=28) (actual time=0.009..63.904 rows=35769 loops=1)
Total runtime: 2596.653 ms

what i would really like to do is to inform the planner that the row counts and selectivity of the gist index are such that it is better to use the index in most every situation. i have run vacuum analyze but it does not seem to help. alternatively, is there some way of rephrasing the query (via subselects?) which will encourage the query planner in the right direction?

best regards, ben

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Dunstan 2010-10-28 17:03:24 Re: plperl arginfo
Previous Message Stephen J. Butler 2010-10-28 16:55:48 Re: plperl arginfo