From: | "Jonah H(dot) Harris" <jonah(dot)harris(at)gmail(dot)com> |
---|---|
To: | "Hannes Eder" <hannes(at)hanneseder(dot)net> |
Cc: | "Pg Hackers" <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: WIP: Hash Join-Filter Pruning using Bloom Filters |
Date: | 2008-11-02 22:50:26 |
Message-ID: | 36e682920811021450vaf7642bve38d1e5e1025ac60@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Sun, Nov 2, 2008 at 5:36 PM, Hannes Eder <hannes(at)hanneseder(dot)net> wrote:
> On Sun, Nov 2, 2008 at 10:49 PM, Jonah H. Harris <jonah(dot)harris(at)gmail(dot)com> wrote:
>> Similarly, I
>> created a GUC to enable pruning, named bloom_pruning.
>
> I guess calls to bloom_filter_XXX should be surrounded by "if
> (bloom_pruning) ..." or a similar construct, i.e. make use of the GUC
> variable bloom_pruning in the rest of the code.
It's effective as-is for a preliminary patch. The GUC code is the
least of my worries.
> Can you provide some figures on the performance impact of the bloom filter?
It depends on the queries. I've been trying to find a good suite of
hash join tests... but not much luck.
CREATE TABLE t1 (id INTEGER PRIMARY KEY, x INTEGER);
CREATE TABLE t2 (id INTEGER PRIMARY KEY, x INTEGER);
INSERT INTO t1 (SELECT ge, ge % 100 FROM generate_series(1, 1000000) ge);
INSERT INTO t2 (SELECT * FROM t1);
VACUUM ANALYZE;
SELECT COUNT(*)
FROM t1, t2
WHERE t1.id = t2.id
AND t1.x < 30
AND t2.x > 10;
SET bloom_pruning TO off;
EXPLAIN
SELECT COUNT(*)
FROM t1, t2
WHERE t1.id = t2.id
AND t1.x < 30
AND t2.x > 10;
\timing
SELECT COUNT(*)
FROM t1, t2
WHERE t1.id = t2.id
AND t1.x < 30
AND t2.x > 10;
\timing
EXPLAIN
SELECT *
FROM t1, t2
WHERE t1.id = t2.id
AND t1.x < 30
AND t2.x > 10;
\timing
SELECT *
FROM t1, t2
WHERE t1.id = t2.id
AND t1.x < 30
AND t2.x > 10;
\timing
SET bloom_pruning TO on;
\timing
SELECT COUNT(*)
FROM t1, t2
WHERE t1.id = t2.id
AND t1.x < 30
AND t2.x > 10;
\timing
EXPLAIN
SELECT *
FROM t1, t2
WHERE t1.id = t2.id
AND t1.x < 30
AND t2.x > 10;
\timing
SELECT *
FROM t1, t2
WHERE t1.id = t2.id
AND t1.x < 30
AND t2.x > 10;
\timing
-- Without Pruning
Time: 1142.843 ms
Time: 1567.355 ms
-- With Pruning
Time: 891.557 ms
Time: 1269.634 ms
--
Jonah H. Harris, Senior DBA
myYearbook.com
From | Date | Subject | |
---|---|---|---|
Next Message | Mark Kirkwood | 2008-11-02 23:16:07 | Hot standby v5 patch assertion failure |
Previous Message | Hannes Eder | 2008-11-02 22:36:53 | Re: WIP: Hash Join-Filter Pruning using Bloom Filters |