Query causing explosion of temp space with join involving partitioning

From: Krzysztof Nienartowicz <Krzysztof(dot)Nienartowicz(at)unige(dot)ch>
To: pgsql-bugs(at)postgresql(dot)org
Subject: Query causing explosion of temp space with join involving partitioning
Date: 2010-05-20 13:35:56
Message-ID: AANLkTimdhGNxZHp8oaXhfjebSawqmmTacS0JrDJLUACO@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Hello,
I have an ORM-generated queries where parent table keys are used to
fetch the records from the child table (with relevant FK indicies),
where child table is partitioned. My understanding is that Postgres is
unable to properly use constraint exclusion to query only a relevant
table? Half of the join condition is propagated down, while the other
is not.

table sources has pk (sureyid,srcid), ts has fk(survey_pk,source_pk)
on source (sureyid,srcid) and another index with
survey_pk,source_pk,tstype (not used in the query).

This is very unfortunate as the queries are auto-generated and I
cannot move predicate to apply it directly to partitioned table.

The plan includes all the partitions, next snippet shows exclusion
works for the table when condition is used directly on the partitioned
table.

surveys-> SELECT t1.SURVEY_PK, t1.SOURCE_PK, t1.TSTYPE, t1.VALS
surveys-> FROM sources t0 ,TS t1 where
surveys-> (t0.SURVEYID = 16 AND t0.SRCID >= 203510110032281 AND
t0.SRCID <= 203520107001677 and 16 = t1.SURVEY_PK AND t0.SRCID =
t1.SOURCE_PK ) ORDER BY t0.SURVEYID ASC, t0.SRCID ASC
surveys->
surveys-> ;
QUERY
PLAN
----------------------------------------------------------------------------------------------------------------------------------------
Merge Join (cost=11575858.83..11730569.40 rows=3448336 width=60)
Merge Cond: (t0.srcid = t1.source_pk)
-> Index Scan using sources_pkey on sources t0
(cost=0.00..68407.63 rows=37817 width=12)
Index Cond: ((surveyid = 16) AND (srcid >=
203510110032281::bigint) AND (srcid <= 203520107001677::bigint))
-> Materialize (cost=11575858.83..11618963.03 rows=3448336 width=48)
-> Sort (cost=11575858.83..11584479.67 rows=3448336 width=48)
Sort Key: t1.source_pk
-> Append (cost=0.00..11049873.18 rows=3448336 width=48)
-> Index Scan using ts_pkey on ts t1
(cost=0.00..8.27 rows=1 width=853)
Index Cond: (survey_pk = 16)
-> Index Scan using ts_part_bs3000l00000_ts_pkey
on ts_part_bs3000l00000 t1 (cost=0.00..8.27 rows=1 width=48)
Index Cond: (survey_pk = 16)
-> Bitmap Heap Scan on
ts_part_bs3000l00001_cg0346l00000 t1 (cost=5760.36..1481735.21
rows=462422 width=48)
Recheck Cond: (survey_pk = 16)
-> Bitmap Index Scan on
ts_part_bs3000l00001_cg0346l00000_ts_pkey (cost=0.00..5644.75
rows=462422 width=0)
Index Cond: (survey_pk = 16)
-> Bitmap Heap Scan on
ts_part_cg0346l00001_cg0816k00000 t1 (cost=5951.07..1565423.79
rows=488582 width=48)
Recheck Cond: (survey_pk = 16)
-> Bitmap Index Scan on
ts_part_cg0346l00001_cg0816k00000_ts_pkey (cost=0.00..5828.93
rows=488582 width=0)
Index Cond: (survey_pk = 16)
-> Bitmap Heap Scan on
ts_part_cg0816k00001_cg1180k00000 t1 (cost=5513.54..1432657.90
rows=447123 width=48)
Recheck Cond: (survey_pk = 16)
-> Bitmap Index Scan on
ts_part_cg0816k00001_cg1180k00000_ts_pkey (cost=0.00..5401.75
rows=447123 width=0)
Index Cond: (survey_pk = 16)
-> Bitmap Heap Scan on
ts_part_cg1180k00001_cg6204k00000 t1 (cost=5212.63..1329884.46
rows=415019 width=48)
Recheck Cond: (survey_pk = 16)
-> Bitmap Index Scan on
ts_part_cg1180k00001_cg6204k00000_ts_pkey (cost=0.00..5108.87
rows=415019 width=0)
Index Cond: (survey_pk = 16)
-> Bitmap Heap Scan on
ts_part_cg6204k00001_lm0022n00000 t1 (cost=5450.37..1371917.76
rows=428113 width=48)
Recheck Cond: (survey_pk = 16)
-> Bitmap Index Scan on
ts_part_cg6204k00001_lm0022n00000_ts_pkey (cost=0.00..5343.35
rows=428113 width=0)
Index Cond: (survey_pk = 16)
-> Bitmap Heap Scan on
ts_part_lm0022n00001_lm0276m00000 t1 (cost=5136.71..1298542.32
rows=405223 width=48)
Recheck Cond: (survey_pk = 16)
-> Bitmap Index Scan on
ts_part_lm0022n00001_lm0276m00000_ts_pkey (cost=0.00..5035.40
rows=405223 width=0)
Index Cond: (survey_pk = 16)
-> Bitmap Heap Scan on
ts_part_lm0276m00001_lm0584k00000 t1 (cost=5770.98..1525737.42
rows=476204 width=48)
Recheck Cond: (survey_pk = 16)
-> Bitmap Index Scan on
ts_part_lm0276m00001_lm0584k00000_ts_pkey (cost=0.00..5651.93
rows=476204 width=0)
Index Cond: (survey_pk = 16)
-> Bitmap Heap Scan on
ts_part_lm0584k00001_sm0073k00000 t1 (cost=4536.03..1043949.51
rows=325647 width=48)
Recheck Cond: (survey_pk = 16)
-> Bitmap Index Scan on
ts_part_lm0584k00001_sm0073k00000_ts_pkey (cost=0.00..4454.62
rows=325647 width=0)
Index Cond: (survey_pk = 16)
-> Index Scan using ts_part_sm0073k00001_ts_pkey
on ts_part_sm0073k00001 t1 (cost=0.00..8.27 rows=1 width=48)
Index Cond: (survey_pk = 16)
(46 rows)

Check to see if the exclusion works and yes, it does.

surveys=> explain SELECT t1.SURVEY_PK, t1.SOURCE_PK, t1.TSTYPE, t1.VALS
surveys-> FROM TS t1 WHERE t1.SURVEY_PK =16 AND t1.SOURCE_PK>=
202970108014045 AND t1.SOURCE_PK <= 202970108014909
surveys-> ORDER BY t1.SURVEY_PK ASC, t1.SOURCE_PK ASC
surveys-> ;

QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------
Sort (cost=9454.13..9459.91 rows=2313 width=48)
Sort Key: t1.source_pk
-> Result (cost=0.00..9324.88 rows=2313 width=48)
-> Append (cost=0.00..9324.88 rows=2313 width=48)
-> Index Scan using ts_pkey on ts t1 (cost=0.00..8.27
rows=1 width=853)
Index Cond: ((survey_pk = 16) AND (source_pk >=
202970108014045::bigint) AND (source_pk <= 202970108014909::bigint))
-> Index Scan using
ts_part_bs3000l00001_cg0346l00000_ts_pkey on
ts_part_bs3000l00001_cg0346l00000 t1 (cost=0.00..9316.61 rows=2312
width=48)
Index Cond: ((survey_pk = 16) AND (source_pk >=
202970108014045::bigint) AND (source_pk <= 202970108014909::bigint))
(8 rows)

Is there any workaround for that, except changing the query? Any plans
to implement it?

The second issue is connected to this one it looks like a bug to me:

I have around 30 clients running the same query with different
parameters, but the query always returns 1000 rows (boundary values
are pre-calculated,so it's like traversal of the equiwidth histogram
if it comes to srcid/source_pk) and the rows from parallel queries
cannot be overlapping. Usually query returns within around a second.
I noticed however there are some queries that hang for many hours and
what is most curious some of them created several GB of temp files.
The partition size is around 10M entries, there are around 10
partitions as I mentioned there is no way this query should fetch more
then 1000 entries. Some queries may span multiple, adjacent partitions
(but not the one above, as we can see). I tried to run this exploding
query without ordering, but it did not change anything, behaviour is
repeatable from the command line, if the query is divided by hand with
same parameters values that are returned are ok, within a second.
There are many AccessShareLock locks, all of them granted.
Killing the clients that issues the queries does not change much -
these are still running, DB immediate restart helps.

Environment: Rocks 5.2, kernel 2.6.18, PostgreSQL 8.4.2 on
x86_64-redhat-linux-gnu, compiled by GCC gcc (GCC) 4.1.2 20071124 (Red
Hat 4.1.2-42), 64-bit, client is JDBC.

The queries start to fail eventually due to the lack of space (over
500GB used by temp files), i.e. one of queries hangs for hours with
temp allocation like:

< ls -hs1 data/base/pgsql_tmp/
1.1G pgsql_tmp27571.0
1.1G pgsql_tmp27571.1
1.1G pgsql_tmp27571.10
1.1G pgsql_tmp27571.11
1.1G pgsql_tmp27571.12
1.1G pgsql_tmp27571.13
1.1G pgsql_tmp27571.14
1.1G pgsql_tmp27571.15
1.1G pgsql_tmp27571.16
1.1G pgsql_tmp27571.17
1.1G pgsql_tmp27571.18
1.1G pgsql_tmp27571.19
1.1G pgsql_tmp27571.2
1.0G pgsql_tmp27571.20
1.0G pgsql_tmp27571.21
1.1G pgsql_tmp27571.22
1.1G pgsql_tmp27571.23
1.1G pgsql_tmp27571.24
1.1G pgsql_tmp27571.25
1.1G pgsql_tmp27571.26
801M pgsql_tmp27571.27
1.1G pgsql_tmp27571.3
1.1G pgsql_tmp27571.4
1.1G pgsql_tmp27571.5
1.1G pgsql_tmp27571.6
1.1G pgsql_tmp27571.7
1.1G pgsql_tmp27571.8
1.1G pgsql_tmp27571.9
>

The transaction does not generate any updates/deletes.

Working queries (subsecond) have plan like this, condition pushed down
will use the index this time.

surveys=> explain SELECT t0.SURVEYID, t0.SRCID, t1.SURVEY_PK,
t1.SOURCE_PK, t1.TSTYPE, t1.VALS,
t1.CCDIDS, t1.FLAGS, t1.OBSTIME, t1.LEN, t1.VALUEERRORS
FROM sources t0 ,TS t1 where
(t0.SURVEYID = 16 AND t0.SRCID >= 202970108014045 AND t0.SRCID <=
202970108014909 and 16 = t1.SURVEY_PK AND t0.SRCID = t1.SOURCE_PK )
ORDER BY t0.SURVEYID ASC, t0.SRCID ASC
;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=0.00..68958.49 rows=17242 width=339)
Join Filter: (t0.srcid = t1.source_pk)
-> Index Scan using sources_pkey on sources t0 (cost=0.00..35.48
rows=1 width=12)
Index Cond: ((surveyid = 16) AND (srcid >=
202970108014045::bigint) AND (srcid <= 202970108014909::bigint))
-> Append (cost=0.00..68707.45 rows=17245 width=327)
-> Index Scan using ts_pkey on ts t1 (cost=0.00..8.27
rows=1 width=1665)
Index Cond: ((t1.survey_pk = 16) AND (t1.source_pk = t0.srcid))
-> Index Scan using ts_part_bs3000l00000_ts_pkey on
ts_part_bs3000l00000 t1 (cost=0.00..8.27 rows=1 width=327)
Index Cond: ((t1.survey_pk = 16) AND (t1.source_pk = t0.srcid))
-> Bitmap Heap Scan on ts_part_bs3000l00001_cg0346l00000 t1
(cost=40.29..9208.75 rows=2312 width=327)
Recheck Cond: ((t1.survey_pk = 16) AND (t1.source_pk = t0.srcid))
-> Bitmap Index Scan on
ts_part_bs3000l00001_cg0346l00000_ts_pkey (cost=0.00..39.71 rows=2312
width=0)
Index Cond: ((t1.survey_pk = 16) AND
(t1.source_pk = t0.srcid))
-> Bitmap Heap Scan on ts_part_cg0346l00001_cg0816k00000 t1
(cost=41.60..9729.55 rows=2443 width=327)
Recheck Cond: ((t1.survey_pk = 16) AND (t1.source_pk = t0.srcid))
-> Bitmap Index Scan on
ts_part_cg0346l00001_cg0816k00000_ts_pkey (cost=0.00..40.99 rows=2443
width=0)
Index Cond: ((t1.survey_pk = 16) AND
(t1.source_pk = t0.srcid))
-> Bitmap Heap Scan on ts_part_cg0816k00001_cg1180k00000 t1
(cost=39.25..8906.31 rows=2236 width=327)
Recheck Cond: ((t1.survey_pk = 16) AND (t1.source_pk = t0.srcid))
-> Bitmap Index Scan on
ts_part_cg0816k00001_cg1180k00000_ts_pkey (cost=0.00..38.69 rows=2236
width=0)
Index Cond: ((t1.survey_pk = 16) AND
(t1.source_pk = t0.srcid))
-> Bitmap Heap Scan on ts_part_cg1180k00001_cg6204k00000 t1
(cost=37.50..8266.11 rows=2075 width=327)
Recheck Cond: ((t1.survey_pk = 16) AND (t1.source_pk = t0.srcid))
-> Bitmap Index Scan on
ts_part_cg1180k00001_cg6204k00000_ts_pkey (cost=0.00..36.98 rows=2075
width=0)
Index Cond: ((t1.survey_pk = 16) AND
(t1.source_pk = t0.srcid))
-> Bitmap Heap Scan on ts_part_cg6204k00001_lm0022n00000 t1
(cost=38.44..8528.77 rows=2141 width=327)
Recheck Cond: ((t1.survey_pk = 16) AND (t1.source_pk = t0.srcid))
-> Bitmap Index Scan on
ts_part_cg6204k00001_lm0022n00000_ts_pkey (cost=0.00..37.91 rows=2141
width=0)
Index Cond: ((t1.survey_pk = 16) AND
(t1.source_pk = t0.srcid))
-> Bitmap Heap Scan on ts_part_lm0022n00001_lm0276m00000 t1
(cost=36.99..8071.29 rows=2026 width=327)
Recheck Cond: ((t1.survey_pk = 16) AND (t1.source_pk = t0.srcid))
-> Bitmap Index Scan on
ts_part_lm0022n00001_lm0276m00000_ts_pkey (cost=0.00..36.49 rows=2026
width=0)
Index Cond: ((t1.survey_pk = 16) AND
(t1.source_pk = t0.srcid))
-> Bitmap Heap Scan on ts_part_lm0276m00001_lm0584k00000 t1
(cost=40.80..9482.89 rows=2381 width=327)
Recheck Cond: ((t1.survey_pk = 16) AND (t1.source_pk = t0.srcid))
-> Bitmap Index Scan on
ts_part_lm0276m00001_lm0584k00000_ts_pkey (cost=0.00..40.21 rows=2381
width=0)
Index Cond: ((t1.survey_pk = 16) AND
(t1.source_pk = t0.srcid))
-> Bitmap Heap Scan on ts_part_lm0584k00001_sm0073k00000 t1
(cost=32.95..6488.95 rows=1628 width=327)
Recheck Cond: ((t1.survey_pk = 16) AND (t1.source_pk = t0.srcid))
-> Bitmap Index Scan on
ts_part_lm0584k00001_sm0073k00000_ts_pkey (cost=0.00..32.54 rows=1628
width=0)
Index Cond: ((t1.survey_pk = 16) AND
(t1.source_pk = t0.srcid))
-> Index Scan using ts_part_sm0073k00001_ts_pkey on
ts_part_sm0073k00001 t1 (cost=0.00..8.27 rows=1 width=327)
Index Cond: ((t1.survey_pk = 16) AND (t1.source_pk = t0.srcid))
(43 rows)

And to check if partition exclusion is used for the working query:
surveys=> explain select t0.SURVEY_PK, t0.SOURCE_PK, t0.TSTYPE,
t0.VALS from ts t0 where t0.SURVEY_pk = 16 AND t0.SOURCE_PK >=
202970108014045 AND t0.Source_pk <= 202970108014909 ;

QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------
Result (cost=0.00..9324.88 rows=2313 width=48)
-> Append (cost=0.00..9324.88 rows=2313 width=48)
-> Index Scan using ts_pkey on ts t0 (cost=0.00..8.27
rows=1 width=853)
Index Cond: ((survey_pk = 16) AND (source_pk >=
202970108014045::bigint) AND (source_pk <= 202970108014909::bigint))
-> Index Scan using
ts_part_bs3000l00001_cg0346l00000_ts_pkey on
ts_part_bs3000l00001_cg0346l00000 t0 (cost=0.00..9316.61 rows=2312
width=48)
Index Cond: ((survey_pk = 16) AND (source_pk >=
202970108014045::bigint) AND (source_pk <= 202970108014909::bigint))
(6 rows)

Could you please advise how to cope with this? Should I file the bug?
Does any workaround exist?

Best Regards,
Krzysztof

Browse pgsql-bugs by date

  From Date Subject
Next Message Krzysztof Nienartowicz 2010-05-20 14:00:03 Query causing explosion of temp space with join involving partitioning
Previous Message Diffor 2010-05-20 12:12:59 Re: BUG #5466: Asia/Novosibirsk timezone problem