Skip site navigation (1) Skip section navigation (2)

Getting different number of results when using hashjoin on/off

From: Mario Weilguni <mweilguni(at)sime(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Getting different number of results when using hashjoin on/off
Date: 2005-11-28 13:00:32
Message-ID: 200511281400.32103.mweilguni@sime.com (view raw or flat)
Thread:
Lists: pgsql-hackers
I've a problem that might be a bug in the core system (hashjoins) or with ltree using gist-index, but I fail miserable to produce a useful testcase (using 8.1, worked in 8.0):

A query produces wrong (=0) results, when a different plan is enforced, I get a merge-join plan that looks similar, but produces the correct result (=16 rows).

I can post a queryplan, but cannot post the data itself since it's confidental (though I might be able to randomize some data and construct a self contained case, but this would take quite some time).


The working case is:
set enable_hashjoin to off;
 Seq Scan on foo1 cost=0.00..423583.57 rows=10810 width=4) (actual time=675.422..706.815 rows=16 loops=1)
   Filter: (subplan)
   SubPlan
     ->  Merge Join  (cost=19.49..19.55 rows=1 width=0) (actual time=0.028..0.028 rows=0 loops=21619)
           Merge Cond: ("outer".str_id = "inner".id)
           ->  Sort  (cost=6.49..6.50 rows=5 width=4) (actual time=0.023..0.023 rows=0 loops=21619)
                 Sort Key: bz.str_id
                 ->  Bitmap Heap Scan on foo2 bz  (cost=2.02..6.43 rows=5 width=4) (actual time=0.012..0.012 rows=0 loops=21619)
                       Recheck Cond: (bid = $0)
                       ->  Bitmap Index Scan on foo2_bid_key1  (cost=0.00..2.02 rows=5 width=0) (actual time=0.009..0.009 rows=0 loops=21619)
                             Index Cond: (bid = $0)
           ->  Sort  (cost=13.00..13.01 rows=6 width=4) (actual time=0.002..0.003 rows=1 loops=136)
                 Sort Key: str.id
                 ->  Bitmap Heap Scan on structure str  (cost=2.02..12.92 rows=6 width=4) (actual time=0.095..0.097 rows=1 loops=1)
                       Recheck Cond: (path ~ '142.2330445.2330598.2330676.*'::lquery)
                       ->  Bitmap Index Scan on str_uk4  (cost=0.00..2.02 rows=6 width=0) (actual time=0.086..0.086 rows=1 loops=1)
                             Index Cond: (path ~ '142.2330445.2330598.2330676.*'::lquery)
 Total runtime: 707.019 ms

16 rows...


The failing case is:
set enable_hashjoin to on;
 Seq Scan on foo1 cost=0.00..421679.00 rows=10810 width=4) (actual time=154.663..154.663 rows=0 loops=1)
   Filter: (subplan)
   SubPlan
     ->  Hash Join  (cost=8.47..19.46 rows=1 width=0) (actual time=0.004..0.004 rows=0 loops=21619)
           Hash Cond: ("outer".id = "inner".str_id)
           ->  Bitmap Heap Scan on structure str  (cost=2.02..12.92 rows=6 width=4) (actual time=0.100..30.095 rows=1 loops=1)
                 Recheck Cond: (path ~ '142.2330445.2330598.2330676.*'::lquery)
                 ->  Bitmap Index Scan on str_uk4  (cost=0.00..2.02 rows=6 width=0) (actual time=0.090..0.090 rows=1 loops=1)
                       Index Cond: (path ~ '142.2330445.2330598.2330676.*'::lquery)
           ->  Hash  (cost=6.43..6.43 rows=5 width=4) (actual time=0.032..0.032 rows=0 loops=1)
                 ->  Bitmap Heap Scan on foo2 bz  (cost=2.02..6.43 rows=5 width=4) (actual time=0.025..0.025 rows=0 loops=1)
                       Recheck Cond: (bid = $0)
                       ->  Bitmap Index Scan on foo2_bid_key1  (cost=0.00..2.02 rows=5 width=0) (actual time=0.021..0.021 rows=0 loops=1)
                             Index Cond: (bid = $0)
 Total runtime: 154.862 ms
No rows....

The query itself is quite simple:
select foo1.id
from foo1
where 
  foo1.datloesch is null
  and exists (select 1 
                from foo2 bz,
                     structure str
               where bz.bid=foo1.id
                 and str.id = bz.str_id
                 and str.path ~ '*.2330676.*'
              );

The path field is an "ltree" column, with an GIST index on it.


Any ideas what I could try to track this down?

Best regards,
	Mario Weilguni

Responses

pgsql-hackers by date

Next:From: Christopher Kings-LynneDate: 2005-11-28 13:12:26
Subject: Re: Getting different number of results when using hashjoin
Previous:From: Martijn van OosterhoutDate: 2005-11-28 12:32:02
Subject: Re: Using multi-row technique with COPY

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group