Re: fast DISTINCT or EXIST

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Tilo Buschmann <mailinglist(dot)postgresql(dot)performance(at)b-n-w(dot)org>
Cc: Arjen van der Meijden <acmmailing(at)tweakers(dot)net>, PostgreSQL Performance <pgsql-performance(at)postgresql(dot)org>
Subject: Re: fast DISTINCT or EXIST
Date: 2007-04-07 16:39:38
Message-ID: 9915.1175963978@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Tilo Buschmann <mailinglist(dot)postgresql(dot)performance(at)b-n-w(dot)org> writes:
>> Arjen van der Meijden <acmmailing(at)tweakers(dot)net> writes:
>>> SELECT ... FROM cd
>>> JOIN tracks ...
>>> WHERE cd.id IN (SELECT DISTINCT cd_id FROM (SELECT t.cd_id FROM tracks t
>>> WHERE t.tstitle @@ plainto_tsquery('simple','education') LIMIT 30)
>>> as foo LIMIT 10)

> Unfortunately, the query above will definitely not work correctly, if
> someone searches for "a" or "the".

Well, the "incorrectness" is only that it might deliver fewer than the
hoped-for ten CDs ... but that was a completely arbitrary cutoff anyway,
no? I think in practice this'd give perfectly acceptable results.

> Actually, I hoped to find an alternative, that does not involve
> DISTINCT.

You could try playing around with GROUP BY rather than DISTINCT; those
are separate code paths and will probably give you different plans.
But I don't think you'll find that GROUP BY does any better on this
particular measure of yielding rows before the full input has been
scanned.

regards, tom lane

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Arjen van der Meijden 2007-04-07 17:28:52 Re: fast DISTINCT or EXIST
Previous Message Tilo Buschmann 2007-04-07 16:24:07 Re: fast DISTINCT or EXIST