Re: Multiple Uniques

From: Greg Stark <gsstark(at)mit(dot)edu>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Neil Conway <neilc(at)samurai(dot)com>, Markus Schaber <schabios(at)logi-track(dot)com>, PostgreSQL Performance List <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Multiple Uniques
Date: 2004-09-10 06:34:28
Message-ID: 877jr2tyx7.fsf@stark.xeocode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance


Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:

> Neil Conway <neilc(at)samurai(dot)com> writes:
>
> > How many cycles are we really talking about, though? I have a patch
> > which I'll send along in a few days which implements a similar
> > optimization: if a subselect is referenced by EXISTS or IN, we can
> > discard DISTINCT and ORDER BY clauses in the subquery
>
> I don't think either of those is worth doing. ORDER BY in a sub-select
> isn't even legal SQL, much less probable, so why should we expend even
> a nanosecond to optimize it? The DISTINCT is more of a judgement call,
> but my thought when I looked at it originally is that it would give
> people a possible optimization knob. If you write DISTINCT in an IN
> clause then you can get a different plan (the IN reduces to an ordinary
> join) that might or might not be better than without it. We shouldn't
> take away that possibility just on the grounds of nanny-ism.

Just one user's 2c: Consider the plight of dynamically constructed queries.
The queries within "IN" clauses are particularly likely to be constructed this
way. The query in the IN clause could be a constructed in an entirely separate
function without any idea that it will be used within an IN clause.

E.g. something like:

$accessible_ids = $security_manager->get_accessible_ids_query($this->userid);
$selected_columns = $this->selected_columns_parameters();
$query = "select $selected_columns where id IN ($accessible_ids)"

In an ideal world functionally equivalent queries should always generate
identical plans. Of course there are limitations, it's not an ideal world, but
as much as possible it should be possible to write code without having to
worry whether the optimizer will be able to figure it out.

--
greg

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Pierre-Frédéric Caillaud 2004-09-10 06:36:16 Question on Byte Sizes
Previous Message Tom Lane 2004-09-10 04:17:11 Re: Costly "Sort Key" on indexed timestamp column