Bad query plans for queries on partitioned table

From: Julian Mehnle <julian(at)mehnle(dot)net>
To: pgsql-performance(at)postgresql(dot)org
Subject: Bad query plans for queries on partitioned table
Date: 2007-12-04 19:44:08
Message-ID: 200712041944.13543.julian@mehnle.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hi all,

I have a large database with e-mail meta-data (no bodies) for over 100
million messages. I am running PostgreSQL 8.2.4 on a server with 2GB of
RAM (shared_buffers = 240MB, temp_buffers = 128MB, work_mem = 256MB,
maintenance_work_mem = 256MB). I have the data split in two separate
tables, "email" and "email_extras":

Table "public.email"
Column | Type | Modifiers
-------------------+-----------------------------+-----------
id | bigint | not null
load_id | integer | not null
ts | timestamp without time zone | not null
ip | inet | not null
mfrom | text | not null
helo | text |

Table "public.email_extras"
Column | Type | Modifiers
-------------------+-----------------------------+-----------
id | bigint | not null
ts | timestamp without time zone | not null
size | integer | not null
hdr_from | text |

Each of these tables has been partitioned equally based on the "ts"
(timestamp) field into two dozen or so tables, each covering one week of
messages. For testing purposes, I have only one week's partition filled
for each of the "email" and "email_extras" tables (email_2007_week34
{,extras}).

Now if I perform the following simple join on the "email" and "email_
extras" tables ...

SELECT
count(*)
FROM
email
INNER JOIN email_extras USING (id, ts)
WHERE
mfrom <> hdr_from;

then I get the following horrendously inefficient plan:

QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=391396890.89..391396890.90 rows=1 width=0)
-> Merge Join (cost=9338881.64..349156398.02 rows=16896197148 width=0)
Merge Cond: ((public.email_extras.id = public.email.id) AND (public.email_extras.ts = public.email.ts))
Join Filter: (public.email.mfrom <> public.email_extras.hdr_from)
-> Sort (cost=4592966.95..4658121.33 rows=26061752 width=48)
Sort Key: public.email_extras.id, public.email_extras.ts
-> Append (cost=0.00..491969.52 rows=26061752 width=48)
-> Seq Scan on email_extras (cost=0.00..13.30 rows=330 width=48)
-> Seq Scan on email_2007_week13_extras email_extras (cost=0.00..13.30 rows=330 width=48)
-> Seq Scan on email_2007_week14_extras email_extras (cost=0.00..13.30 rows=330 width=48)
-> Seq Scan on email_2007_week15_extras email_extras (cost=0.00..13.30 rows=330 width=48)
-> Seq Scan on email_2007_week16_extras email_extras (cost=0.00..13.30 rows=330 width=48)
-> Seq Scan on email_2007_week17_extras email_extras (cost=0.00..13.30 rows=330 width=48)
-> Seq Scan on email_2007_week18_extras email_extras (cost=0.00..13.30 rows=330 width=48)
-> Seq Scan on email_2007_week19_extras email_extras (cost=0.00..13.30 rows=330 width=48)
-> Seq Scan on email_2007_week20_extras email_extras (cost=0.00..13.30 rows=330 width=48)
-> Seq Scan on email_2007_week21_extras email_extras (cost=0.00..13.30 rows=330 width=48)
-> Seq Scan on email_2007_week22_extras email_extras (cost=0.00..13.30 rows=330 width=48)
-> Seq Scan on email_2007_week23_extras email_extras (cost=0.00..13.30 rows=330 width=48)
-> Seq Scan on email_2007_week24_extras email_extras (cost=0.00..13.30 rows=330 width=48)
-> Seq Scan on email_2007_week25_extras email_extras (cost=0.00..13.30 rows=330 width=48)
-> Seq Scan on email_2007_week26_extras email_extras (cost=0.00..13.30 rows=330 width=48)
-> Seq Scan on email_2007_week27_extras email_extras (cost=0.00..13.30 rows=330 width=48)
-> Seq Scan on email_2007_week28_extras email_extras (cost=0.00..13.30 rows=330 width=48)
-> Seq Scan on email_2007_week29_extras email_extras (cost=0.00..13.30 rows=330 width=48)
-> Seq Scan on email_2007_week30_extras email_extras (cost=0.00..13.30 rows=330 width=48)
-> Seq Scan on email_2007_week31_extras email_extras (cost=0.00..13.30 rows=330 width=48)
-> Seq Scan on email_2007_week32_extras email_extras (cost=0.00..13.30 rows=330 width=48)
-> Seq Scan on email_2007_week33_extras email_extras (cost=0.00..13.30 rows=330 width=48)
-> Seq Scan on email_2007_week34_extras email_extras (cost=0.00..491597.12 rows=26052512 width=33)
-> Seq Scan on email_2007_week35_extras email_extras (cost=0.00..13.30 rows=330 width=48)
-> Seq Scan on email_2007_week36_extras email_extras (cost=0.00..13.30 rows=330 width=48)
-> Seq Scan on email_2007_week37_extras email_extras (cost=0.00..13.30 rows=330 width=48)
-> Seq Scan on email_2007_week38_extras email_extras (cost=0.00..13.30 rows=330 width=48)
-> Seq Scan on email_2007_week39_extras email_extras (cost=0.00..13.30 rows=330 width=48)
-> Seq Scan on email_2007_week40_extras email_extras (cost=0.00..13.30 rows=330 width=48)
-> Sort (cost=4745914.69..4811071.87 rows=26062872 width=48)
Sort Key: public.email.id, public.email.ts
-> Append (cost=0.00..644732.72 rows=26062872 width=48)
-> Seq Scan on email (cost=0.00..13.70 rows=370 width=48)
-> Seq Scan on email_2007_week13 email (cost=0.00..13.70 rows=370 width=48)
-> Seq Scan on email_2007_week14 email (cost=0.00..13.70 rows=370 width=48)
-> Seq Scan on email_2007_week15 email (cost=0.00..13.70 rows=370 width=48)
-> Seq Scan on email_2007_week16 email (cost=0.00..13.70 rows=370 width=48)
-> Seq Scan on email_2007_week17 email (cost=0.00..13.70 rows=370 width=48)
-> Seq Scan on email_2007_week18 email (cost=0.00..13.70 rows=370 width=48)
-> Seq Scan on email_2007_week19 email (cost=0.00..13.70 rows=370 width=48)
-> Seq Scan on email_2007_week20 email (cost=0.00..13.70 rows=370 width=48)
-> Seq Scan on email_2007_week21 email (cost=0.00..13.70 rows=370 width=48)
-> Seq Scan on email_2007_week22 email (cost=0.00..13.70 rows=370 width=48)
-> Seq Scan on email_2007_week23 email (cost=0.00..13.70 rows=370 width=48)
-> Seq Scan on email_2007_week24 email (cost=0.00..13.70 rows=370 width=48)
-> Seq Scan on email_2007_week25 email (cost=0.00..13.70 rows=370 width=48)
-> Seq Scan on email_2007_week26 email (cost=0.00..13.70 rows=370 width=48)
-> Seq Scan on email_2007_week27 email (cost=0.00..13.70 rows=370 width=48)
-> Seq Scan on email_2007_week28 email (cost=0.00..13.70 rows=370 width=48)
-> Seq Scan on email_2007_week29 email (cost=0.00..13.70 rows=370 width=48)
-> Seq Scan on email_2007_week30 email (cost=0.00..13.70 rows=370 width=48)
-> Seq Scan on email_2007_week31 email (cost=0.00..13.70 rows=370 width=48)
-> Seq Scan on email_2007_week32 email (cost=0.00..13.70 rows=370 width=48)
-> Seq Scan on email_2007_week33 email (cost=0.00..13.70 rows=370 width=48)
-> Seq Scan on email_2007_week34 email (cost=0.00..644349.12 rows=26052512 width=33)
-> Seq Scan on email_2007_week35 email (cost=0.00..13.70 rows=370 width=48)
-> Seq Scan on email_2007_week36 email (cost=0.00..13.70 rows=370 width=48)
-> Seq Scan on email_2007_week37 email (cost=0.00..13.70 rows=370 width=48)
-> Seq Scan on email_2007_week38 email (cost=0.00..13.70 rows=370 width=48)
-> Seq Scan on email_2007_week39 email (cost=0.00..13.70 rows=370 width=48)
-> Seq Scan on email_2007_week40 email (cost=0.00..13.70 rows=370 width=48)
(68 rows)

However, if I restrict the query to just the partitions that actually do
have data in them ...

SELECT
count(*)
FROM
email_2007_week34
INNER JOIN email_2007_week34_extras USING (id, ts)
WHERE
mfrom <> hdr_from;

then I get a much better plan that uses a hash join:

QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=4266338.94..4266338.95 rows=1 width=0)
-> Hash Join (cost=1111362.80..4266336.07 rows=1145 width=0)
Hash Cond: ((email_2007_week34.ts = email_2007_week34_extras.ts) AND (email_2007_week34.id = email_2007_week34_extras.id))
Join Filter: (email_2007_week34.mfrom <> email_2007_week34_extras.hdr_from)
-> Seq Scan on email_2007_week34 (cost=0.00..644349.12 rows=26052512 width=33)
-> Hash (cost=491597.12..491597.12 rows=26052512 width=33)
-> Seq Scan on email_2007_week34_extras (cost=0.00..491597.12 rows=26052512 width=33)
(7 rows)

Yes, I have `ANALYZE`d the database before running the queries.

How come the query planner gets thrown off that far by the simple table
partitioning? What can I do to put the query planner back on the right
track?

Julian.

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Julian Mehnle 2007-12-04 20:03:14 Re: Bad query plans for queries on partitioned table
Previous Message Gregory Stark 2007-12-04 19:28:42 Re: RAID arrays and performance