Re: query optimization

From: Thomas Kellerer <spam_eater(at)gmx(dot)net>
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: query optimization
Date: 2012-04-26 19:38:11
Message-ID: jnc860$q30$1@dough.gmane.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Tom Lane wrote on 26.04.2012 21:17:
> Richard Kojedzinszky<krichy(at)tvnetwork(dot)hu> writes:
>> Dear list,
>> We have a database schema, which looks the same as the attached script.
>
>> When filling the tables with data, and skipping analyze on the table (so
>> pg_stats contains no records for table 'a'), the first select in the
>> script runs fast, but after an analyze the planner decides to sequence
>> scan tables b and c, thus making the query much slower. Can somebody help
>> me solving this issue, or tuning our installation to not to use sequence
>> scans in this case?
>
> Um ... did you analyze all the tables, or just some of them? I get
> sub-millisecond runtimes if all four tables have been analyzed, but it
> does seem to pick lousy plans if, say, only a and b have been analyzed.
>

Here it's similar to Richard's experience:

Before analyzing the four tables, the first statement yields this plan:

Merge Left Join (cost=504.89..2634509.91 rows=125000000 width=16) (actual time=0.103..0.108 rows=1 loops=1)
Merge Cond: (a.b = b.id)
-> Sort (cost=504.89..506.14 rows=500 width=8) (actual time=0.043..0.043 rows=1 loops=1)
Sort Key: a.b
Sort Method: quicksort Memory: 17kB
-> Bitmap Heap Scan on a (cost=12.14..482.47 rows=500 width=8) (actual time=0.028..0.029 rows=1 loops=1)
Recheck Cond: (id = 4)
-> Bitmap Index Scan on a_idx1 (cost=0.00..12.01 rows=500 width=0) (actual time=0.021..0.021 rows=1 loops=1)
Index Cond: (id = 4)
-> Materialize (cost=0.00..884002.52 rows=50000000 width=8) (actual time=0.041..0.057 rows=5 loops=1)
-> Merge Join (cost=0.00..759002.52 rows=50000000 width=8) (actual time=0.037..0.051 rows=5 loops=1)
Merge Cond: (b.id = c.id)
-> Index Scan using b_idx1 on b (cost=0.00..4376.26 rows=100000 width=4) (actual time=0.016..0.018 rows=5 loops=1)
-> Materialize (cost=0.00..4626.26 rows=100000 width=4) (actual time=0.017..0.022 rows=5 loops=1)
-> Index Scan using c_idx1 on c (cost=0.00..4376.26 rows=100000 width=4) (actual time=0.014..0.017 rows=5 loops=1)
Total runtime: 0.209 ms

This continues to stay the plan for about 10-15 repetitions, then it turns to this plan

Hash Right Join (cost=2701.29..6519.30 rows=1 width=16) (actual time=79.604..299.227 rows=1 loops=1)
Hash Cond: (b.id = a.b)
-> Hash Join (cost=2693.00..6136.00 rows=100000 width=8) (actual time=79.550..265.251 rows=100000 loops=1)
Hash Cond: (b.id = c.id)
-> Seq Scan on b (cost=0.00..1443.00 rows=100000 width=4) (actual time=0.011..36.158 rows=100000 loops=1)
-> Hash (cost=1443.00..1443.00 rows=100000 width=4) (actual time=79.461..79.461 rows=100000 loops=1)
Buckets: 16384 Batches: 1 Memory Usage: 2735kB
-> Seq Scan on c (cost=0.00..1443.00 rows=100000 width=4) (actual time=0.010..32.930 rows=100000 loops=1)
-> Hash (cost=8.28..8.28 rows=1 width=8) (actual time=0.015..0.015 rows=1 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 1kB
-> Index Scan using a_idx1 on a (cost=0.00..8.28 rows=1 width=8) (actual time=0.010..0.012 rows=1 loops=1)
Index Cond: (id = 4)
Total runtime: 299.564 ms

(I guess autovacuum kicked in, because that the same plan I get when running analyze on all four tables right after populating them)

And the second one yields this one here (Regardless of analyze or not):

QUERY PLAN
Nested Loop Left Join (cost=0.00..16.89 rows=1 width=16) (actual time=0.027..0.031 rows=1 loops=1)
-> Nested Loop Left Join (cost=0.00..16.57 rows=1 width=12) (actual time=0.020..0.022 rows=1 loops=1)
-> Index Scan using a_idx1 on a (cost=0.00..8.28 rows=1 width=8) (actual time=0.011..0.012 rows=1 loops=1)
Index Cond: (id = 4)
-> Index Scan using b_idx1 on b (cost=0.00..8.28 rows=1 width=4) (actual time=0.004..0.005 rows=1 loops=1)
Index Cond: (a.b = id)
-> Index Scan using c_idx1 on c (cost=0.00..0.31 rows=1 width=4) (actual time=0.004..0.005 rows=1 loops=1)
Index Cond: (b.id = id)
Total runtime: 0.104 ms

My version:
PostgreSQL 9.1.3, compiled by Visual C++ build 1500, 32-bit
Running on Windows XP SP3

shared_buffers = 768MB
work_mem = 24MB
effective_cache_size = 1024MB

All other (relevant) settings are on defaults

Regards
Thomas

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Craig James 2012-04-26 19:49:31 auto-vacuum vs. full table update
Previous Message Tom Lane 2012-04-26 19:17:18 Re: query optimization