RE: BUG #15922: Simple select with multiple exists filters returns duplicates from a primary key field

From: David Raymond <David(dot)Raymond(at)tomtom(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "pgsql-bugs(at)lists(dot)postgresql(dot)org" <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject: RE: BUG #15922: Simple select with multiple exists filters returns duplicates from a primary key field
Date: 2019-07-23 21:42:06
Message-ID: VI1PR07MB579234388D19A00C5C7B2A4787C70@VI1PR07MB5792.eurprd07.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Update so far: I did manage to go and replace all the UUIDs with random ones and it's still doing it, so I do have a sanitized version now. No real luck with trimming down the record count though. When deleting too many records it would change the query plan to something not broken. Even after replacing the UUIDs and not deleting anything I ran analyze and it came up clean, and I had to vacuum analyze for it to pick the broken plan again. (That example pasted below) The dump file is at least consistently doing the same thing where immediately after load the plan chosen gives a consistent answer, but once analyzed it gives the bad duplicates. As it stands the dump file is 130 MB (30MB zipped), is that too big to send in to you?

mnr=> analyze n, n2a, n2h;
ANALYZE
Time: 13152.868 ms (00:13.153)
mnr=> select count(*), count(distinct id) from n where exists (select 1 from n2h where n_id = n.id) and exists (select 1 from n2a where n_id = n.id and a_id = '23ce1ae4-eb99-486a-924e-2fe74180a884');
count | count
-------+-------
6,531 | 6,531
(1 row)

Time: 14685.529 ms (00:14.686)
mnr=> explain analyze verbose select count(*), count(distinct id) from n where exists (select 1 from n2h where n_id = n.id) and exists (select 1 from n2a where n_id = n.id and a_id = '23ce1ae4-eb99-486a-924e-2fe74180a884');
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=57772.45..57772.46 rows=1 width=16) (actual time=877.457..877.457 rows=1 loops=1)
Output: count(*), count(DISTINCT n.id)
-> Gather (cost=6251.99..57749.48 rows=4595 width=16) (actual time=568.882..890.117 rows=6531 loops=1)
Output: n.id
Workers Planned: 2
Workers Launched: 2
-> Nested Loop (cost=5251.99..56289.98 rows=1915 width=16) (actual time=451.244..752.614 rows=2177 loops=3)
Output: n.id
Inner Unique: true
Join Filter: (n2h.n_id = n2a.n_id)
Worker 0: actual time=372.423..675.329 rows=2007 loops=1
Worker 1: actual time=415.902..719.048 rows=2483 loops=1
-> Parallel Hash Semi Join (cost=5251.43..15379.95 rows=23878 width=32) (actual time=67.824..159.168 rows=20442 loops=3)
Output: n.id, n2h.n_id
Hash Cond: (n.id = n2h.n_id)
Worker 0: actual time=0.718..86.042 rows=18097 loops=1
Worker 1: actual time=26.684..123.731 rows=21898 loops=1
-> Parallel Seq Scan on bug_test.n (cost=0.00..9461.62 rows=152862 width=16) (actual time=30.176..69.082 rows=122290 loops=3)
Output: n.id
Worker 0: actual time=0.456..29.446 rows=107115 loops=1
Worker 1: actual time=25.198..58.556 rows=137270 loops=1
-> Parallel Hash (cost=4740.08..4740.08 rows=40908 width=16) (actual time=29.188..29.189 rows=32727 loops=3)
Output: n2h.n_id
Buckets: 131072 Batches: 1 Memory Usage: 5632kB
Worker 0: actual time=0.049..0.049 rows=0 loops=1
Worker 1: actual time=0.052..0.052 rows=0 loops=1
-> Parallel Seq Scan on bug_test.n2h (cost=0.00..4740.08 rows=40908 width=16) (actual time=22.989..61.789 rows=98180 loops=1)
Output: n2h.n_id
-> Index Only Scan using n2a_pkey on bug_test.n2a (cost=0.55..1.70 rows=1 width=16) (actual time=0.027..0.027 rows=0 loops=61325)
Output: n2a.n_id, n2a.a_id
Index Cond: ((n2a.n_id = n.id) AND (n2a.a_id = '23ce1ae4-eb99-486a-924e-2fe74180a884'::uuid))
Heap Fetches: 6531
Worker 0: actual time=0.031..0.031 rows=0 loops=18097
Worker 1: actual time=0.025..0.025 rows=0 loops=21898
Planning Time: 0.940 ms
Execution Time: 896.125 ms
(36 rows)

Time: 898.028 ms
mnr=> vacuum full freeze verbose analyze n, n2a, n2h;
INFO: vacuuming "bug_test.n"
INFO: "n": found 0 removable, 366869 nonremovable row versions in 7933 pages
DETAIL: 0 dead row versions cannot be removed yet.
CPU: user: 0.10 s, system: 0.12 s, elapsed: 0.71 s.
INFO: analyzing "bug_test.n"
INFO: "n": scanned 1984 of 1984 pages, containing 366869 live rows and 0 dead rows; 30000 rows in sample, 366869 estimated total rows
INFO: vacuuming "bug_test.n2a"
INFO: "n2a": found 3093218 removable, 1546626 nonremovable row versions in 56861 pages
DETAIL: 0 dead row versions cannot be removed yet.
CPU: user: 1.01 s, system: 0.96 s, elapsed: 3.92 s.
INFO: analyzing "bug_test.n2a"
INFO: "n2a": scanned 11373 of 11373 pages, containing 1546626 live rows and 0 dead rows; 30000 rows in sample, 1546626 estimated total rows
INFO: vacuuming "bug_test.n2h"
INFO: "n2h": found 96 removable, 98180 nonremovable row versions in 4331 pages
DETAIL: 0 dead row versions cannot be removed yet.
CPU: user: 0.06 s, system: 0.03 s, elapsed: 0.24 s.
INFO: analyzing "bug_test.n2h"
INFO: "n2h": scanned 722 of 722 pages, containing 98180 live rows and 0 dead rows; 30000 rows in sample, 98180 estimated total rows
VACUUM
Time: 30115.734 ms (00:30.116)
mnr=> select count(*), count(distinct id) from n where exists (select 1 from n2h where n_id = n.id) and exists (select 1 from n2a where n_id = n.id and a_id = '23ce1ae4-eb99-486a-924e-2fe74180a884');
count | count
-------+-------
8,858 | 6,531
(1 row)

Time: 193.787 ms
mnr=> explain analyze verbose select count(*), count(distinct id) from n where exists (select 1 from n2h where n_id = n.id) and exists (select 1 from n2a where n_id = n.id and a_id = '23ce1ae4-eb99-486a-924e-2fe74180a884');
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=22165.61..22165.62 rows=1 width=16) (actual time=209.476..209.477 rows=1 loops=1)
Output: count(*), count(DISTINCT n.id)
-> Gather (cost=8697.91..22141.26 rows=4871 width=16) (actual time=80.311..247.967 rows=8858 loops=1)
Output: n.id
Workers Planned: 2
Workers Launched: 2
-> Nested Loop (cost=7697.91..20654.16 rows=2030 width=16) (actual time=25.238..47.870 rows=2953 loops=3)
Output: n.id
Inner Unique: true
Worker 0: actual time=0.135..0.135 rows=0 loops=1
Worker 1: actual time=0.174..0.174 rows=0 loops=1
-> Parallel Hash Join (cost=7697.49..19295.63 rows=2007 width=32) (actual time=25.229..31.540 rows=2953 loops=3)
Output: n2a.n_id, n2h.n_id
Hash Cond: (n2a.n_id = n2h.n_id)
Worker 0: actual time=0.135..0.135 rows=0 loops=1
Worker 1: actual time=0.173..0.173 rows=0 loops=1
-> Parallel Bitmap Heap Scan on bug_test.n2a (cost=1001.75..12536.94 rows=12975 width=16) (actual time=2.951..6.906 rows=30578 loops=1)
Output: n2a.n_id, n2a.a_id
Recheck Cond: (n2a.a_id = '23ce1ae4-eb99-486a-924e-2fe74180a884'::uuid)
Heap Blocks: exact=226
-> Bitmap Index Scan on n2a_a_id_n_id_idx (cost=0.00..993.97 rows=31139 width=0) (actual time=2.904..2.905 rows=30578 loops=1)
Index Cond: (n2a.a_id = '23ce1ae4-eb99-486a-924e-2fe74180a884'::uuid)
-> Parallel Hash (cost=6184.38..6184.38 rows=40908 width=16) (actual time=23.653..23.654 rows=32727 loops=3)
Output: n2h.n_id
Buckets: 131072 Batches: 1 Memory Usage: 5632kB
Worker 0: actual time=0.041..0.041 rows=0 loops=1
Worker 1: actual time=0.040..0.040 rows=0 loops=1
-> Parallel Index Only Scan using n2h_pkey on bug_test.n2h (cost=0.42..6184.38 rows=40908 width=16) (actual time=0.022..49.775 rows=98180 loops=1)
Output: n2h.n_id
Heap Fetches: 98180
-> Index Only Scan using n_pkey on bug_test.n (cost=0.42..0.68 rows=1 width=16) (actual time=0.004..0.004 rows=1 loops=8858)
Output: n.id
Index Cond: (n.id = n2h.n_id)
Heap Fetches: 8858
Planning Time: 0.567 ms
Execution Time: 255.986 ms
(36 rows)

Time: 257.446 ms
mnr=>

David Raymond | Associate Quality Analyst - MPU Addressing | TomTom | Lebanon, NH, United States
e-mail: david(dot)raymond(at)tomtom(dot)com  | office +1 603 306 8498 | www.tomtom.com

-----Original Message-----
From: David Raymond
Sent: Tuesday, July 23, 2019 2:15 PM
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: RE: BUG #15922: Simple select with multiple exists filters returns duplicates from a primary key field

Dropping the two foreign key constrains didn't seem to alter the plan. Doing the "SET max_parallel_workers_per_gather = 0;" thing did alter it and it's giving consistent results.

I'll see if I can manufacture a smaller dataset that produces similar results. Since I cut it down to just UUIDs I don't think there'd be a problem with sending that other than the size of it all.

mnr=> select count(*), count(distinct id) from n where exists (select 1 from n2h where n_id = n.id) and exists (select 1
from n2a where n_id = n.id and a_id = '00005831-4900-1200-0000-0000773ae45f');
count | count
-------+-------
8,858 | 6,531
(1 row)

Time: 151.132 ms
mnr=> set max_parallel_workers_per_gather = 0;
SET
Time: 0.234 ms
mnr=> select count(*), count(distinct id) from n where exists (select 1 from n2h where n_id = n.id) and exists (select 1 from n2a where n_id = n.id and a_id = '00005831-4900-1200-0000-0000773ae45f');
count | count
-------+-------
6,531 | 6,531
(1 row)

Time: 457.642 ms
mnr=> explain analyze verbose select count(*), count(distinct id) from n where exists (select 1 from n2h where n_id = n.id) and exists (select 1 from n2a where n_id = n.id and a_id = '00005831-4900-1200-0000-0000773ae45f');
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=24232.21..24232.22 rows=1 width=16) (actual time=499.999..499.999 rows=1 loops=1)
Output: count(*), count(DISTINCT n.id)
-> Hash Semi Join (cost=16556.91..24208.38 rows=4766 width=16) (actual time=98.205..495.173 rows=6531 loops=1)
Output: n.id
Hash Cond: (n.id = n2h.n_id)
-> Hash Join (cost=13145.86..19761.59 rows=30750 width=32) (actual time=59.001..133.273 rows=30578 loops=1)
Output: n.id, n2a.n_id
Inner Unique: true
Hash Cond: (n.id = n2a.n_id)
-> Seq Scan on bug_test.n (cost=0.00..5652.69 rows=366869 width=16) (actual time=0.016..36.268 rows=366869 loops=1)
Output: n.id
-> Hash (cost=12757.91..12757.91 rows=31036 width=16) (actual time=16.897..16.897 rows=30578 loops=1)
Output: n2a.n_id
Buckets: 32768 Batches: 1 Memory Usage: 1690kB
-> Bitmap Heap Scan on bug_test.n2a (cost=996.96..12757.91 rows=31036 width=16) (actual time=2.533..10.997 rows=30578 loops=1)
Output: n2a.n_id
Recheck Cond: (n2a.a_id = '00005831-4900-1200-0000-0000773ae45f'::uuid)
Heap Blocks: exact=2814
-> Bitmap Index Scan on n2a_a_id_n_id_idx (cost=0.00..989.20 rows=31036 width=0) (actual time=2.190..2.190 rows=30578 loops=1)
Index Cond: (n2a.a_id = '00005831-4900-1200-0000-0000773ae45f'::uuid)
-> Hash (cost=1703.80..1703.80 rows=98180 width=16) (actual time=37.623..37.623 rows=98180 loops=1)
Output: n2h.n_id
Buckets: 131072 Batches: 2 Memory Usage: 3325kB
-> Seq Scan on bug_test.n2h (cost=0.00..1703.80 rows=98180 width=16) (actual time=0.019..15.656 rows=98180 loops=1)
Output: n2h.n_id
Planning Time: 0.760 ms
Execution Time: 500.745 ms
(27 rows)

Time: 502.287 ms
mnr=>

David Raymond | Associate Quality Analyst - MPU Addressing | TomTom | Lebanon, NH, United States
e-mail: david(dot)raymond(at)tomtom(dot)com  | office +1 603 306 8498 | www.tomtom.com

-----Original Message-----
From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Sent: Tuesday, July 23, 2019 1:52 PM
To: David Raymond <David(dot)Raymond(at)tomtom(dot)com>
Cc: pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #15922: Simple select with multiple exists filters returns duplicates from a primary key field

PG Bug reporting form <noreply(at)postgresql(dot)org> writes:
> I have a case where I'm doing a simple select from a table but I'm getting
> returned duplicates from its primary key field when I have two different
> exists statements in the where clause.

Hm. Seems like pretty clearly a planner bug --- the second plan seems
to be missing the de-duplication step that's done by the HashAggregate
in the first plan. But I don't want to try to reproduce it with just
the info in this report. Can you generate dummy or sanitized data to
make a self-contained test case that reproduces the problem?

It'd be useful to know also if the problem goes away if you disable
parallel query (SET max_parallel_workers_per_gather = 0, for instance).

Also, does the issue go away if you drop either or both of the foreign
key constraints?

regards, tom lane

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Tom Lane 2019-07-23 21:49:39 Re: BUG #15922: Simple select with multiple exists filters returns duplicates from a primary key field
Previous Message Andres Freund 2019-07-23 20:00:39 Re: DROP STATISTICS results in "ERROR: tuple concurrently updated"