Quick Links

RE: BUG #15922: Simple select with multiple exists filters returns duplicates from a primary key field

From:	David Raymond <David(dot)Raymond(at)tomtom(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	"pgsql-bugs(at)lists(dot)postgresql(dot)org" <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject:	RE: BUG #15922: Simple select with multiple exists filters returns duplicates from a primary key field
Date:	2019-07-23 18:14:59
Message-ID:	VI1PR07MB5792B53637F1DDC0A8A30D2B87C70@VI1PR07MB5792.eurprd07.prod.outlook.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-bugs

Dropping the two foreign key constrains didn't seem to alter the plan. Doing the "SET max_parallel_workers_per_gather = 0;" thing did alter it and it's giving consistent results.

I'll see if I can manufacture a smaller dataset that produces similar results. Since I cut it down to just UUIDs I don't think there'd be a problem with sending that other than the size of it all.

mnr=> select count(*), count(distinct id) from n where exists (select 1 from n2h where n_id = n.id) and exists (select 1
from n2a where n_id = n.id and a_id = '00005831-4900-1200-0000-0000773ae45f');
count | count
-------+-------
8,858 | 6,531
(1 row)

Time: 151.132 ms
mnr=> set max_parallel_workers_per_gather = 0;
SET
Time: 0.234 ms
mnr=> select count(*), count(distinct id) from n where exists (select 1 from n2h where n_id = n.id) and exists (select 1 from n2a where n_id = n.id and a_id = '00005831-4900-1200-0000-0000773ae45f');
count | count
-------+-------
6,531 | 6,531
(1 row)

Time: 457.642 ms
mnr=> explain analyze verbose select count(*), count(distinct id) from n where exists (select 1 from n2h where n_id = n.id) and exists (select 1 from n2a where n_id = n.id and a_id = '00005831-4900-1200-0000-0000773ae45f');
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=24232.21..24232.22 rows=1 width=16) (actual time=499.999..499.999 rows=1 loops=1)
Output: count(*), count(DISTINCT n.id)
-> Hash Semi Join (cost=16556.91..24208.38 rows=4766 width=16) (actual time=98.205..495.173 rows=6531 loops=1)
Output: n.id
Hash Cond: (n.id = n2h.n_id)
-> Hash Join (cost=13145.86..19761.59 rows=30750 width=32) (actual time=59.001..133.273 rows=30578 loops=1)
Output: n.id, n2a.n_id
Inner Unique: true
Hash Cond: (n.id = n2a.n_id)
-> Seq Scan on bug_test.n (cost=0.00..5652.69 rows=366869 width=16) (actual time=0.016..36.268 rows=366869 loops=1)
Output: n.id
-> Hash (cost=12757.91..12757.91 rows=31036 width=16) (actual time=16.897..16.897 rows=30578 loops=1)
Output: n2a.n_id
Buckets: 32768 Batches: 1 Memory Usage: 1690kB
-> Bitmap Heap Scan on bug_test.n2a (cost=996.96..12757.91 rows=31036 width=16) (actual time=2.533..10.997 rows=30578 loops=1)
Output: n2a.n_id
Recheck Cond: (n2a.a_id = '00005831-4900-1200-0000-0000773ae45f'::uuid)
Heap Blocks: exact=2814
-> Bitmap Index Scan on n2a_a_id_n_id_idx (cost=0.00..989.20 rows=31036 width=0) (actual time=2.190..2.190 rows=30578 loops=1)
Index Cond: (n2a.a_id = '00005831-4900-1200-0000-0000773ae45f'::uuid)
-> Hash (cost=1703.80..1703.80 rows=98180 width=16) (actual time=37.623..37.623 rows=98180 loops=1)
Output: n2h.n_id
Buckets: 131072 Batches: 2 Memory Usage: 3325kB
-> Seq Scan on bug_test.n2h (cost=0.00..1703.80 rows=98180 width=16) (actual time=0.019..15.656 rows=98180 loops=1)
Output: n2h.n_id
Planning Time: 0.760 ms
Execution Time: 500.745 ms
(27 rows)

Time: 502.287 ms
mnr=>

-----Original Message-----
From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Sent: Tuesday, July 23, 2019 1:52 PM
To: David Raymond <David(dot)Raymond(at)tomtom(dot)com>
Cc: pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #15922: Simple select with multiple exists filters returns duplicates from a primary key field

PG Bug reporting form <noreply(at)postgresql(dot)org> writes:
> I have a case where I'm doing a simple select from a table but I'm getting
> returned duplicates from its primary key field when I have two different
> exists statements in the where clause.

Hm. Seems like pretty clearly a planner bug --- the second plan seems
to be missing the de-duplication step that's done by the HashAggregate
in the first plan. But I don't want to try to reproduce it with just
the info in this report. Can you generate dummy or sanitized data to
make a self-contained test case that reproduces the problem?

It'd be useful to know also if the problem goes away if you disable
parallel query (SET max_parallel_workers_per_gather = 0, for instance).

Also, does the issue go away if you drop either or both of the foreign
key constraints?

regards, tom lane

In response to

Re: BUG #15922: Simple select with multiple exists filters returns duplicates from a primary key field at 2019-07-23 17:52:04 from Tom Lane

Responses

RE: BUG #15922: Simple select with multiple exists filters returns duplicates from a primary key field at 2019-07-23 21:42:06 from David Raymond

Browse pgsql-bugs by date

	From	Date	Subject
Next Message	Andres Freund	2019-07-23 20:00:39	Re: DROP STATISTICS results in "ERROR: tuple concurrently updated"
Previous Message	Tom Lane	2019-07-23 18:14:56	Re: DROP STATISTICS results in "ERROR: tuple concurrently updated"