Skip site navigation (1) Skip section navigation (2)

Re: Any better plan for this query?..

From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Dimitri <dimitrik(dot)fr(at)gmail(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Dimitri Fontaine <dfontaine(at)hi-media(dot)com>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Aidan Van Dyk <aidan(at)highrise(dot)ca>, Merlin Moncure <mmoncure(at)gmail(dot)com>, PostgreSQL Performance <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Any better plan for this query?..
Date: 2009-05-20 08:11:37
Message-ID: 1242807097.27960.39.camel@ebony.2ndQuadrant (view raw, whole thread or download thread mbox)
Lists: pgsql-performance
On Tue, 2009-05-19 at 23:54 -0400, Robert Haas wrote:

> I don't think it's a good idea to write off the idea of implementing
> this optimization at some point.  I see a lot of queries that join one
> fairly large table against a whole bunch of little tables, and then
> sorting the results by a column that is indexed in the big table.  

Agreed it's a common use case.

> The
> optimizer handles this by sequentially scanning the big table, hash
> joining against all of the little tables, and then sorting the output,
> which is pretty silly (given that all of the tables fit in RAM and are
> in fact actually cached there).  If there is a LIMIT clause, then it
> might instead index-scan the big table, do the hash joins, and then
> sort the already-ordered results.  This is better because at least
> we're not sorting the entire table unnecessarily but it's still poor.

The Hash node is fully executed before we start pulling rows through the
Hash Join node. So the Hash Join node will know at execution time
whether or not it will continue to maintain sorted order. So we put the
Sort node into the plan, then the Sort node can just ask the Hash Join
at execution time whether it should perform a sort or just pass rows
through (act as a no-op).

The cost of the Sort node can either be zero, or pro-rated down from the
normal cost based upon what we think the probability is of going
multi-batch, which would vary by work_mem available.

 Simon Riggs 
 PostgreSQL Training, Services and Support

In response to


pgsql-performance by date

Next:From: Robert HaasDate: 2009-05-20 11:17:26
Subject: Re: Any better plan for this query?..
Previous:From: Robert HaasDate: 2009-05-20 03:54:19
Subject: Re: Any better plan for this query?..

Privacy Policy | About PostgreSQL
Copyright © 1996-2017 The PostgreSQL Global Development Group