Re: a few crazy ideas about hash joins

From: "Lawrence, Ramon" <ramon(dot)lawrence(at)ubc(dot)ca>
To: "Greg Stark" <stark(at)enterprisedb(dot)com>, "Simon Riggs" <simon(at)2ndquadrant(dot)com>
Cc: "Robert Haas" <robertmhaas(at)gmail(dot)com>, "PostgreSQL-development" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: a few crazy ideas about hash joins
Date: 2009-04-03 18:44:50
Message-ID: 6EEA43D22289484890D119821101B1DF05190DEF@exchange20.mercury.ad.ubc.ca
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> > I would be especially interested in using a shared memory hash table
> > that *all* backends can use - if the table is mostly read-only, as
> > dimension tables often are in data warehouse applications. That
would
> > give zero startup cost and significantly reduced memory.
>
> I think that's a non-starter due to visibility issues and handling
> inserts and updates. Even just reusing a hash from one execution in a
> later execution of the same plan would be tricky since we would have
> to expire it if the snapshot changes.

If your data set is nearly read-only, materialized views would be a
better way to go and would require no hash join changes.

The idea of perfect hash functions for dimension tables is very
interesting. If the data set is near static, it is possible to compute
them once in a few minutes time for a million tuple table and then
re-use them until they change. The research has shown it is possible,
but I do not know if anyone has actually implemented it in a real DBMS.
An implementation could be something to try if there is interest.

--
Ramon Lawrence

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2009-04-03 18:49:19 Re: can't load plpython
Previous Message Alvaro Herrera 2009-04-03 18:00:36 Re: can't load plpython