From: | David Raymond <David(dot)Raymond(at)tomtom(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | "pgsql-bugs(at)lists(dot)postgresql(dot)org" <pgsql-bugs(at)lists(dot)postgresql(dot)org> |
Subject: | RE: BUG #15922: Simple select with multiple exists filters returns duplicates from a primary key field |
Date: | 2019-07-23 18:14:59 |
Message-ID: | VI1PR07MB5792B53637F1DDC0A8A30D2B87C70@VI1PR07MB5792.eurprd07.prod.outlook.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs |
Dropping the two foreign key constrains didn't seem to alter the plan. Doing the "SET max_parallel_workers_per_gather = 0;" thing did alter it and it's giving consistent results.
I'll see if I can manufacture a smaller dataset that produces similar results. Since I cut it down to just UUIDs I don't think there'd be a problem with sending that other than the size of it all.
mnr=> select count(*), count(distinct id) from n where exists (select 1 from n2h where n_id = n.id) and exists (select 1
from n2a where n_id = n.id and a_id = '00005831-4900-1200-0000-0000773ae45f');
count | count
-------+-------
8,858 | 6,531
(1 row)
Time: 151.132 ms
mnr=> set max_parallel_workers_per_gather = 0;
SET
Time: 0.234 ms
mnr=> select count(*), count(distinct id) from n where exists (select 1 from n2h where n_id = n.id) and exists (select 1 from n2a where n_id = n.id and a_id = '00005831-4900-1200-0000-0000773ae45f');
count | count
-------+-------
6,531 | 6,531
(1 row)
Time: 457.642 ms
mnr=> explain analyze verbose select count(*), count(distinct id) from n where exists (select 1 from n2h where n_id = n.id) and exists (select 1 from n2a where n_id = n.id and a_id = '00005831-4900-1200-0000-0000773ae45f');
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=24232.21..24232.22 rows=1 width=16) (actual time=499.999..499.999 rows=1 loops=1)
Output: count(*), count(DISTINCT n.id)
-> Hash Semi Join (cost=16556.91..24208.38 rows=4766 width=16) (actual time=98.205..495.173 rows=6531 loops=1)
Output: n.id
Hash Cond: (n.id = n2h.n_id)
-> Hash Join (cost=13145.86..19761.59 rows=30750 width=32) (actual time=59.001..133.273 rows=30578 loops=1)
Output: n.id, n2a.n_id
Inner Unique: true
Hash Cond: (n.id = n2a.n_id)
-> Seq Scan on bug_test.n (cost=0.00..5652.69 rows=366869 width=16) (actual time=0.016..36.268 rows=366869 loops=1)
Output: n.id
-> Hash (cost=12757.91..12757.91 rows=31036 width=16) (actual time=16.897..16.897 rows=30578 loops=1)
Output: n2a.n_id
Buckets: 32768 Batches: 1 Memory Usage: 1690kB
-> Bitmap Heap Scan on bug_test.n2a (cost=996.96..12757.91 rows=31036 width=16) (actual time=2.533..10.997 rows=30578 loops=1)
Output: n2a.n_id
Recheck Cond: (n2a.a_id = '00005831-4900-1200-0000-0000773ae45f'::uuid)
Heap Blocks: exact=2814
-> Bitmap Index Scan on n2a_a_id_n_id_idx (cost=0.00..989.20 rows=31036 width=0) (actual time=2.190..2.190 rows=30578 loops=1)
Index Cond: (n2a.a_id = '00005831-4900-1200-0000-0000773ae45f'::uuid)
-> Hash (cost=1703.80..1703.80 rows=98180 width=16) (actual time=37.623..37.623 rows=98180 loops=1)
Output: n2h.n_id
Buckets: 131072 Batches: 2 Memory Usage: 3325kB
-> Seq Scan on bug_test.n2h (cost=0.00..1703.80 rows=98180 width=16) (actual time=0.019..15.656 rows=98180 loops=1)
Output: n2h.n_id
Planning Time: 0.760 ms
Execution Time: 500.745 ms
(27 rows)
Time: 502.287 ms
mnr=>
David Raymond | Associate Quality Analyst - MPU Addressing | TomTom | Lebanon, NH, United States
e-mail: david(dot)raymond(at)tomtom(dot)com | office +1 603 306 8498 | www.tomtom.com
-----Original Message-----
From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Sent: Tuesday, July 23, 2019 1:52 PM
To: David Raymond <David(dot)Raymond(at)tomtom(dot)com>
Cc: pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #15922: Simple select with multiple exists filters returns duplicates from a primary key field
PG Bug reporting form <noreply(at)postgresql(dot)org> writes:
> I have a case where I'm doing a simple select from a table but I'm getting
> returned duplicates from its primary key field when I have two different
> exists statements in the where clause.
Hm. Seems like pretty clearly a planner bug --- the second plan seems
to be missing the de-duplication step that's done by the HashAggregate
in the first plan. But I don't want to try to reproduce it with just
the info in this report. Can you generate dummy or sanitized data to
make a self-contained test case that reproduces the problem?
It'd be useful to know also if the problem goes away if you disable
parallel query (SET max_parallel_workers_per_gather = 0, for instance).
Also, does the issue go away if you drop either or both of the foreign
key constraints?
regards, tom lane
From | Date | Subject | |
---|---|---|---|
Next Message | Andres Freund | 2019-07-23 20:00:39 | Re: DROP STATISTICS results in "ERROR: tuple concurrently updated" |
Previous Message | Tom Lane | 2019-07-23 18:14:56 | Re: DROP STATISTICS results in "ERROR: tuple concurrently updated" |