Re: Any better plan for this query?..

From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Dimitri <dimitrik(dot)fr(at)gmail(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Dimitri Fontaine <dfontaine(at)hi-media(dot)com>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Aidan Van Dyk <aidan(at)highrise(dot)ca>, Merlin Moncure <mmoncure(at)gmail(dot)com>, PostgreSQL Performance <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Any better plan for this query?..
Date: 2009-05-20 08:11:37
Message-ID: 1242807097.27960.39.camel@ebony.2ndQuadrant
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance


On Tue, 2009-05-19 at 23:54 -0400, Robert Haas wrote:

> I don't think it's a good idea to write off the idea of implementing
> this optimization at some point. I see a lot of queries that join one
> fairly large table against a whole bunch of little tables, and then
> sorting the results by a column that is indexed in the big table.

Agreed it's a common use case.

> The
> optimizer handles this by sequentially scanning the big table, hash
> joining against all of the little tables, and then sorting the output,
> which is pretty silly (given that all of the tables fit in RAM and are
> in fact actually cached there). If there is a LIMIT clause, then it
> might instead index-scan the big table, do the hash joins, and then
> sort the already-ordered results. This is better because at least
> we're not sorting the entire table unnecessarily but it's still poor.

The Hash node is fully executed before we start pulling rows through the
Hash Join node. So the Hash Join node will know at execution time
whether or not it will continue to maintain sorted order. So we put the
Sort node into the plan, then the Sort node can just ask the Hash Join
at execution time whether it should perform a sort or just pass rows
through (act as a no-op).

The cost of the Sort node can either be zero, or pro-rated down from the
normal cost based upon what we think the probability is of going
multi-batch, which would vary by work_mem available.

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Robert Haas 2009-05-20 11:17:26 Re: Any better plan for this query?..
Previous Message Robert Haas 2009-05-20 03:54:19 Re: Any better plan for this query?..