Re: Query performance problems with partitioned tables

From: Guillaume Cottenceau <gc(at)mnc(dot)ch>
To: Andreas Haumer <andreas(at)xss(dot)co(dot)at>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Query performance problems with partitioned tables
Date: 2007-04-30 13:54:39
Message-ID: 87fy6ige74.fsf@meuh.mnc.lan
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Andreas Haumer <andreas 'at' xss.co.at> writes:

> > Well, how can the planner know inside which partition the wanted
> > row is? There might be no data, say, inside a couple of
> > partitions in the past before finding the wanted row, in which
> > case 3 partitions in the past must be scanned.
> >
>
> I think the planner could do the following:
>
> a) It could make a better decision in which direction to scan
> the partitions (depending on sort order involved in the query)
>
> b) It could stop scanning as soon as there can not be any further
> resulting row according to the CHECK constraints given on the tables.

About these precise points, I'll let a pg guru give an answer.

> Look at this example:
>
> testdb_part=> select ts from mwdb.t_mv where zr=3622 and ts > '2006-01-01 00:00:00' order by ts asc limit 1;
> ts
> ------------------------
> 2006-01-01 01:00:00+01
> (1 row)
>
> testdb_part=> explain analyze select ts from mwdb.t_mv where zr=3622 and ts > '2006-01-01 00:00:00' order by ts asc limit 1;
> QUERY PLAN
> --------------------------------------------------------------------------------------------------------------------------------------------------------------
> Limit (cost=15843.41..15843.41 rows=1 width=8) (actual time=152.476..152.478 rows=1 loops=1)
> -> Sort (cost=15843.41..15865.39 rows=8795 width=8) (actual time=152.472..152.472 rows=1 loops=1)
> Sort Key: mwdb.t_mv.ts
> -> Result (cost=0.00..15267.23 rows=8795 width=8) (actual time=0.102..122.540 rows=11629 loops=1)
> -> Append (cost=0.00..15267.23 rows=8795 width=8) (actual time=0.098..76.140 rows=11629 loops=1)
> -> Index Scan using pk_mv_zr_ts on t_mv (cost=0.00..8.27 rows=1 width=8) (actual time=0.022..0.022 rows=0 loops=1)
> Index Cond: (((zr)::integer = 3622) AND ((ts)::timestamp with time zone > '2006-01-01 00:00:00+01'::timestamp with time zone))

[...]

> -> Index Scan using pk_mv_200704 on t_mv_200704 t_mv (cost=0.00..1209.39 rows=545 width=8) (actual time=0.061..2.296 rows=711 loops=1)
> Index Cond: (((zr)::integer = 3622) AND ((ts)::timestamp with time zone > '2006-01-01 00:00:00+01'::timestamp with time zone))
> Total runtime: 153.195 ms
> (40 rows)
>
>
> Table t_mv_200601 gets scanned first, which is fine.
>
> This already gives a row matching the given WHERE clause.
> It makes no sense to scan the other tables, as the query
> asks for one row only and all the other tables have timestamps
> larger than all the timestamps in table t_mv_200601 (according
> to the CHECK constraints for the partion tables)

I think this is the last claimed point which is incorrect. Pg has
no general guarantee the partitions actually create a disjoint
set, even with the CHECK constraints. Pg can only optimize by
avoiding scanning the partitions inside which no satisfactory
data could be found by the CHECK constraint, but I think it's not
possible (too complicated) to infer that any found row in your
other partitions would not be in the final resultset because of
1. the query's resultset order 2. the limit 3. the actual
conditions in the CHECK constraints (there is no direct way to
see that timestamps in your 200704 partition are greater than
timsteamp in your 200601 partition).

I guess some sort of pg guru would be needed here to clarify
things in a smart way, unlike me :)

--
Guillaume Cottenceau, MNC Mobile News Channel SA, an Alcatel-Lucent Company
Av. de la Gare 10, 1003 Lausanne, Switzerland - direct +41 21 317 50 36

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Neil Peter Braggio 2007-04-30 14:06:07 Re: Query performance problems with partitioned tables
Previous Message Richard Huxton 2007-04-30 13:45:05 Re: Query performance problems with partitioned tables