Re: Recheck condition

From: Gregory Stark <stark(at)enterprisedb(dot)com>
To: Josh Harrison <joshques(at)gmail(dot)com>
Cc: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Postgresql General <pgsql-general(at)postgresql(dot)org>
Subject: Re: Recheck condition
Date: 2007-12-01 10:13:42
Message-ID: 87wsryk9ih.fsf@oxford.xeocode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

"Martijn van Oosterhout" <kleptog(at)svana(dot)org> writes:

> On Fri, Nov 30, 2007 at 11:27:24AM -0500, Josh Harrison wrote:
>> Thanks for your reply
>> Is there a way to get them not to use the
>> heap for intermediate result and go to heap only for final data? This will
>> drastically improve the performance but Im not sure if postgres can do that?
>> Will creating the index in a different way and/or rewriting the query in a
>> different way achieve this result?
>
> I'm trying to imagine what it would take to avoid the heap access after
> the index scan I don't think it's possible. It would require that the
> bitmaps generated by the bitmap scan have the person_id attached and
> then have the bitmap AND operation only happen if the person IDs match.
> No such machinary currently exists.

I think you're describing a star schema join. This is a common checklist item
for data warehousing databases.

The classic data warehouse has a table like "person" which has the info you're
looking for, and dozens of tables with person_id and possibly some associated
data. In some cases those tables don't even have any other data, the mere
existence of the person_id in that table is enough.

So a typical query could look like something like:

select *
from person
where person_id in (select person_id from people_who_used_service_in_the_past)
and person_id in (select person_id from people_with_big_balances)
and person_id in (select person_id from people_...)
and person_id not in (select person_id from people_who_unsubscribed)
and person_id not in (select person_id from people_who_we_mailed_last_week)

The best plan for this is to gather up the person_ids in a kind of bitmap scan
with a bitmap of ids. And once the bitmap is done scan an index on person for
just the matching records. Postgres doesn't support anything like this (yet:).

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com
Ask me about EnterpriseDB's PostGIS support!

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Gregory Stark 2007-12-01 11:37:31 Re: full_page_writes = off?
Previous Message Dave Page 2007-12-01 08:44:20 Re: libeay32.dll and libpq.dll