Re: Calculation of unused columns

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Calculation of unused columns
Date: 2009-10-18 22:26:58
Message-ID: 603c8f070910181526q2869eadbx4b8a805401013d13@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Sun, Oct 18, 2009 at 4:54 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> On Sun, Oct 18, 2009 at 1:59 PM, Jeff Janes <jeff(dot)janes(at)gmail(dot)com> wrote:
>>> Even if country.id is a primary or unique key?
>
>> Well, we currently don't have any logic for making inferences based on
>> unique constraints.
>
> Huh?
> http://archives.postgresql.org/pgsql-committers/2009-09/msg00159.php
>
> Admittedly it's just one case and there's lots more to be done, but it's
> more than nothing.  So this is a *potential* argument for trying to trim
> subquery outputs.  What I'm not sure about is whether there are common
> cases where this would be applicable below a non-flattenable subquery.

Sorry, I have to stop writing emails when I'm half-asleep. Obviously
what we don't have is logic for making deductions based on *foreign
key* constraints, but that's not relevant here.

Maybe I should shut up before I say any more dumb things, but one
possible case where we don't currently do join removal but it would be
nice if we did is:

SELECT ... FROM a.x LEFT JOIN (SELECT bb.x, SUM(1) FROM bb GROUP BY
bb.x) b ON a.x = b.x;

Or even:

SELECT ... FROM a.x LEFT JOIN (SELECT DISTINCT ON (bb.x) ... FROM bb)
b ON a.x = b.x;

Your commit message for the join removal patch mentions
machine-generated SQL, but where join removal really comes up a lot
for me is when using views. I like to define a view that includes all
the columns that seem potentially useful and then let the user pick
which ones they'd like to see. The trouble is that you don't want to
incur the cost of computing the columns that the user doesn't select.
It's probably true that in MOST of the cases where this comes up, the
subquery can be flattened, from_collapse_limit permitting. But I
think there are other cases, too.

Another thing to keep in mind is that, in OLTP environments, it's
sometimes important to minimize the number of server round-trips. The
type of construction suggested by the OP might be someone's way of
gather two somewhat-unrelated values with a single query. Except
sometimes they only need one of them, but they end up paying for both
anyway. They could probably work around this with a little bit
different setup, but I don't think they're entirely wrong to find the
current behavior a little bit surprising.

...Robert

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Tom Lane 2009-10-18 23:41:38 Re: Calculation of unused columns
Previous Message Tom Lane 2009-10-18 20:54:10 Re: Calculation of unused columns