Re: query optimization

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Thomas Kellerer <spam_eater(at)gmx(dot)net>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: query optimization
Date: 2012-04-26 20:08:55
Message-ID: 13222.1335470935@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Thomas Kellerer <spam_eater(at)gmx(dot)net> writes:
> Tom Lane wrote on 26.04.2012 21:17:
>> Um ... did you analyze all the tables, or just some of them? I get
>> sub-millisecond runtimes if all four tables have been analyzed, but it
>> does seem to pick lousy plans if, say, only a and b have been analyzed.

> Here it's similar to Richard's experience:
> Before analyzing the four tables, the first statement yields this plan:
> [ merge joins ]
> This continues to stay the plan for about 10-15 repetitions, then it turns to this plan
> [ hash joins ]

Hmm. I see it liking the merge-join plan (with minor variations) with
or without analyze data, but if just some of the tables have been
analyzed, it goes for the hash plan which is a good deal slower. The
cost estimates aren't that far apart though. In any case, the only
reason the merge join is so fast is that the data is perfectly ordered
in each table; on a less contrived example, it could well be a lot
slower.

> And the second one yields this one here (Regardless of analyze or not):

Yeah, the trick there is that it's valid to re-order the joins, since
they're both left joins.

In git HEAD I get something like this:

regression=# explain analyze select * from a left join (b inner join c on b.id = c.id) on a.b = b.id where a.id = 4;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------
Nested Loop Left Join (cost=0.00..17.18 rows=1 width=16) (actual time=0.024..0.026 rows=1 loops=1)
-> Index Scan using a_idx1 on a (cost=0.00..8.38 rows=1 width=8) (actual time=0.010..0.010 rows=1 loops=1)
Index Cond: (id = 4)
-> Nested Loop (cost=0.00..8.80 rows=1 width=8) (actual time=0.011..0.012 rows=1 loops=1)
-> Index Only Scan using b_idx1 on b (cost=0.00..8.38 rows=1 width=4) (actual time=0.006..0.006 rows=1 loops=1)
Index Cond: (id = a.b)
Heap Fetches: 1
-> Index Only Scan using c_idx1 on c (cost=0.00..0.41 rows=1 width=4) (actual time=0.004..0.005 rows=1 loops=1)
Index Cond: (id = b.id)
Heap Fetches: 1
Total runtime: 0.080 ms
(11 rows)

but 9.1 and older are not smart enough to do it like that when they
can't re-order the joins.

regards, tom lane

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Kevin Grittner 2012-04-26 20:11:07 Re: auto-vacuum vs. full table update
Previous Message Steve Crawford 2012-04-26 20:00:19 Re: auto-vacuum vs. full table update