From: "Tyrrill, Ed" <tyrrill_ed(at)emc(dot)com>
To: <pgsql-performance(at)postgresql(dot)org>
Subject:
Date: 2007-06-25 21:28:32
Message-ID: A23190A408F7094FAF446C1538222F76041E1C28@avaexch01.avamar.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hey All,

I am testing upgrading our database from version 8.1 to 8.2. I ran our
worst performing query on this table, an outer join with an "is null"
condition, and I was happy to see it ran over four times faster. I also
noticed the explain analyze showed the planner chose to do sequential
scans on both tables. I realized I had forgotten to increase
default_statistics_target from the default 10, so I increased it to 100,
and ran "analyze". In 8.1 this sped things up significantly, but in 8.2
when I ran the query again it was actually slower. These tests were
done with 8.2.3.1 so I also loaded 8.2.4.1 for comparison. Here is the
explain analyze with default_statistics_target set to 10 on 8.2.3.1:

mdsdb=# explain analyze select backupobjects.record_id from
backupobjects left outer join backup_location using (record_id) where
backup_location.record_id is null;
QUERY
PLAN
------------------------------------------------------------------------
------------------------------------------------------------------------
Hash Left Join (cost=7259801.24..12768737.71 rows=1 width=8) (actual
time=651003.121..1312717.249 rows=10411 loops=1)
Hash Cond: (backupobjects.record_id = backup_location.record_id)
Filter: (backup_location.record_id IS NULL)
-> Seq Scan on backupobjects (cost=0.00..466835.63 rows=13716963
width=8) (actual time=0.030..95981.895 rows=13706121 loops=1)
-> Hash (cost=3520915.44..3520915.44 rows=215090944 width=8)
(actual time=527345.024..527345.024 rows=215090786 loops=1)
-> Seq Scan on backup_location (cost=0.00..3520915.44
rows=215090944 width=8) (actual time=0.048..333944.886 rows=215090786
loops=1)
Total runtime: 1312727.200 ms

And again with default_statistics_target set to 100 on 8.2.3.1:

mdsdb=# explain analyze select backupobjects.record_id from
backupobjects left outer join backup_location using (record_id) where
backup_location.record_id is null;
QUERY
PLAN
------------------------------------------------------------------------
------------------------------------------------------------------------
Merge Left Join (cost=38173940.88..41468823.19 rows=1 width=8) (actual
time=3256548.988..4299922.345 rows=10411 loops=1)
Merge Cond: (backupobjects.record_id = backup_location.record_id)
Filter: (backup_location.record_id IS NULL)
-> Sort (cost=2258416.72..2292675.79 rows=13703629 width=8) (actual
time=74450.897..85651.707 rows=13706121 loops=1)
Sort Key: backupobjects.record_id
-> Seq Scan on backupobjects (cost=0.00..466702.29
rows=13703629 width=8) (actual time=0.024..40939.762 rows=13706121
loops=1)
-> Sort (cost=35915524.17..36453251.53 rows=215090944 width=8)
(actual time=3182075.661..4094748.788 rows=215090786 loops=1)
Sort Key: backup_location.record_id
-> Seq Scan on backup_location (cost=0.00..3520915.44
rows=215090944 width=8) (actual time=17.905..790499.303 rows=215090786
loops=1)
Total runtime: 4302591.325 ms

With 8.2.4.1 I get the same plan and performance with
default_statistics_target set to either 10 or 100:

mdsdb=# explain analyze select backupobjects.record_id from
backupobjects left outer join backup_location using (record_id) where
backup_location.record_id is null;
QUERY
PLAN
------------------------------------------------------------------------
------------------------------------------------------------------------
Merge Left Join (cost=37615266.46..40910145.54 rows=1 width=8) (actual
time=2765729.582..3768519.658 rows=10411 loops=1)
Merge Cond: (backupobjects.record_id = backup_location.record_id)
Filter: (backup_location.record_id IS NULL)
-> Sort (cost=2224866.79..2259124.25 rows=13702985 width=8) (actual
time=101118.216..113245.942 rows=13706121 loops=1)
Sort Key: backupobjects.record_id
-> Seq Scan on backupobjects (cost=0.00..466695.85
rows=13702985 width=8) (actual time=10.003..67604.564 rows=13706121
loops=1)
-> Sort (cost=35390399.67..35928127.03 rows=215090944 width=8)
(actual time=2664596.049..3540048.500 rows=215090786 loops=1)
Sort Key: backup_location.record_id
-> Seq Scan on backup_location (cost=0.00..3520915.44
rows=215090944 width=8) (actual time=7.110..246561.900 rows=215090786
loops=1)
Total runtime: 3770428.750 ms

And for reference here is the same query with 8.1.5.6:

mdsdb=# explain analyze select backupobjects.record_id from
backupobjects left outer join backup_location using (record_id) where
backup_location.record_id is null;

QUERY PLAN
------------------------------------------------------------------------
------------------------------------------------------------------------
-----------------
Merge Left Join (cost=37490897.67..41269533.13 rows=13705356 width=8)
(actual time=5096492.430..6588745.386 rows=10411 loops=1)
Merge Cond: ("outer".record_id = "inner".record_id)
Filter: ("inner".record_id IS NULL)
-> Index Scan using backupobjects_pkey on backupobjects
(cost=0.00..518007.92 rows=13705356 width=8) (actual
time=32.020..404517.133 rows=13706121 loops=1)
-> Sort (cost=37490897.67..38028625.03 rows=215090944 width=8)
(actual time=5096460.396..6058937.259 rows=215090786 loops=1)
Sort Key: backup_location.record_id
-> Seq Scan on backup_location (cost=0.00..3520915.44
rows=215090944 width=8) (actual time=0.020..389038.442 rows=215090786
loops=1)
Total runtime: 6599215.268 ms
(8 rows)

Based on all this we will be going with 8.2.4.1, but it seems like
currently the query planner isn't choosing the best plan for this case.

Thanks,
Ed

Responses

  • Re: at 2007-06-25 21:56:54 from Stephen Frost
  • Re: at 2007-06-25 22:10:15 from Tom Lane

Browse pgsql-performance by date

  From Date Subject
Next Message Tyrrill, Ed 2007-06-25 21:48:55 Non-optimal query plan with 8.2
Previous Message Bryan Murphy 2007-06-25 20:18:43 startup caching suggestions