BUG #15922: Simple select with multiple exists filters returns duplicates from a primary key field

From: PG Bug reporting form <noreply(at)postgresql(dot)org>
To: pgsql-bugs(at)lists(dot)postgresql(dot)org
Cc: david(dot)raymond(at)tomtom(dot)com
Subject: BUG #15922: Simple select with multiple exists filters returns duplicates from a primary key field
Date: 2019-07-23 16:48:16
Message-ID: 15922-969d62a9b4d95ee1@postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

The following bug has been logged on the website:

Bug reference: 15922
Logged by: David Raymond
Email address: david(dot)raymond(at)tomtom(dot)com
PostgreSQL version: 11.4
Operating system: Windows 7
Description:

Running version 11.4 on Windows 7, EnterpriseDB installer.
I have a case where I'm doing a simple select from a table but I'm getting
returned duplicates from its primary key field when I have two different
exists statements in the where clause.

I simplified the tables down to the bare bones, dumped them out with pg_dump
and re-loaded them. Immediately after loading the data the query returns the
correct, consistent answer. Then after running analyze it starts returning
the duplicates.

The query is this:
select count(*), count(distinct id) from n where exists(select 1 from n2h
where n_id = n.id) and exists(select 1 from n2a where n_id = n.id and a_id =
'00005831-4900-1200-0000-0000773ae45f');

If either one of the 2 exists statements is not included then the results
don't have duplicates, it's only when they are both included that there's a
problem.

The three tables involved have 2 million rows in them, so I can't include
the full dump in the form here. Below is output from psql starting
immediately after the fresh table load and before analyze is done. Shows the
table layouts, the record counts, and the explain results for the same query
before and after analyze is done.

Please let me know what the next bits you need from me are.
Thank you,
-David Raymond <david(dot)raymond(at)tomtom(dot)com>

mnr=> set search_path to bug_test;
SET
Time: 1.427 ms
mnr=> \d+
List of relations
Schema | Name | Type | Owner | Size | Description
----------+------+-------+-------+---------+-------------
bug_test | n | table | mnr | 16 MB |
bug_test | n2a | table | mnr | 89 MB |
bug_test | n2h | table | mnr | 5800 kB |
(3 rows)

mnr=> \d+ n
Table "bug_test.n"
Column | Type | Collation | Nullable | Default | Storage | Stats target |
Description
--------+------+-----------+----------+---------+---------+--------------+-------------
id | uuid | | not null | | plain | |
Indexes:
"n_pkey" PRIMARY KEY, btree (id)
Referenced by:
TABLE "n2a" CONSTRAINT "n2a_n_id_fkey" FOREIGN KEY (n_id) REFERENCES
n(id)
TABLE "n2h" CONSTRAINT "n2h_n_id_fkey" FOREIGN KEY (n_id) REFERENCES
n(id)

mnr=> \d+ n2a
Table "bug_test.n2a"
Column | Type | Collation | Nullable | Default | Storage | Stats target |
Description
--------+------+-----------+----------+---------+---------+--------------+-------------
n_id | uuid | | not null | | plain | |
a_id | uuid | | not null | | plain | |
Indexes:
"n2a_pkey" PRIMARY KEY, btree (n_id, a_id)
"n2a_a_id_n_id_idx" btree (a_id, n_id)
Foreign-key constraints:
"n2a_n_id_fkey" FOREIGN KEY (n_id) REFERENCES n(id)

mnr=> \d+ n2h
Table "bug_test.n2h"
Column | Type | Collation | Nullable | Default | Storage | Stats target |
Description
--------+------+-----------+----------+---------+---------+--------------+-------------
n_id | uuid | | not null | | plain | |
h_id | uuid | | not null | | plain | |
Indexes:
"n2h_pkey" PRIMARY KEY, btree (n_id, h_id)
"n2h_h_id_n_id_idx" btree (h_id, n_id)
Foreign-key constraints:
"n2h_n_id_fkey" FOREIGN KEY (n_id) REFERENCES n(id)

mnr=> select 'n' as table_name, count(*) as record_count from n union all
select 'n2a', count(*) from n2a union all select 'n2h', count(*) from n2h;
table_name | record_count
------------+--------------
n | 366,869
n2a | 1,546,626
n2h | 98,180
(3 rows)

Time: 342.223 ms
mnr=> select count(*), count(distinct id) from n where exists(select 1 from
n2h where n_id = n.id) and exists(select 1 from n2a where n_id = n.id and
a_id = '00005831-4900-1200-0000-0000773ae45f');
count | count
-------+-------
6,531 | 6,531
(1 row)

Time: 426.043 ms
mnr=> explain analyze verbose select count(*), count(distinct id) from n
where exists(select 1 from n2h where n_id = n.id) and exists(select 1 from
n2a where n_id = n.id and a_id = '00005831-4900-1200-0000-0000773ae45f');
QUERY
PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=10609.94..10609.95 rows=1 width=16) (actual
time=300.783..300.783 rows=1 loops=1)
Output: count(*), count(DISTINCT n.id)
-> Nested Loop (cost=1950.10..10590.61 rows=3866 width=16) (actual
time=30.074..295.967 rows=6531 loops=1)
Output: n.id
Inner Unique: true
Join Filter: (n2h.n_id = n.id)
-> Nested Loop (cost=1949.68..2044.12 rows=3866 width=32) (actual
time=30.064..270.423 rows=6531 loops=1)
Output: n2a.n_id, n2h.n_id
Inner Unique: true
-> HashAggregate (cost=1949.25..1951.25 rows=200 width=16)
(actual time=29.997..46.915 rows=61325 loops=1)
Output: n2h.n_id
Group Key: n2h.n_id
-> Seq Scan on bug_test.n2h (cost=0.00..1703.80
rows=98180 width=16) (actual time=0.023..6.345 rows=98180 loops=1)
Output: n2h.n_id, n2h.h_id
-> Index Only Scan using n2a_pkey on bug_test.n2a
(cost=0.43..1.67 rows=39 width=16) (actual time=0.003..0.003 rows=0
loops=61325)
Output: n2a.n_id, n2a.a_id
Index Cond: ((n2a.n_id = n2h.n_id) AND (n2a.a_id =
'00005831-4900-1200-0000-0000773ae45f'::uuid))
Heap Fetches: 6531
-> Index Only Scan using n_pkey on bug_test.n (cost=0.42..2.20
rows=1 width=16) (actual time=0.003..0.003 rows=1 loops=6531)
Output: n.id
Index Cond: (n.id = n2a.n_id)
Heap Fetches: 6531
Planning Time: 0.601 ms
Execution Time: 301.614 ms
(24 rows)

Time: 302.919 ms
mnr=> analyze n, n2a, n2h;
ANALYZE
Time: 252.862 ms
mnr=> select count(*), count(distinct id) from n where exists(select 1 from
n2h where n_id = n.id) and exists(select 1 from n2a where n_id = n.id and
a_id = '00005831-4900-1200-0000-0000773ae45f');
count | count
-------+-------
8,858 | 6,531
(1 row)

Time: 170.372 ms
mnr=> explain analyze verbose select count(*), count(distinct id) from n
where exists(select 1 from n2h where n_id = n.id) and exists(select 1 from
n2a where n_id = n.id and a_id = '00005831-4900-1200-0000-0000773ae45f');

QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=21318.14..21318.15 rows=1 width=16) (actual
time=148.410..148.410 rows=1 loops=1)
Output: count(*), count(DISTINCT n.id)
-> Gather (cost=7592.81..21291.01 rows=5426 width=16) (actual
time=65.294..146.754 rows=8858 loops=1)
Output: n.id
Workers Planned: 2
Workers Launched: 2
-> Nested Loop (cost=6592.81..19748.41 rows=2261 width=16)
(actual time=21.688..48.404 rows=2953 loops=3)
Output: n.id
Inner Unique: true
Worker 0: actual time=1.096..4.734 rows=24 loops=1
Worker 1: actual time=0.470..6.426 rows=225 loops=1
-> Parallel Hash Join (cost=6592.38..18216.45 rows=2263
width=32) (actual time=21.544..31.739 rows=2953 loops=3)
Output: n2a.n_id, n2h.n_id
Hash Cond: (n2a.n_id = n2h.n_id)
Worker 0: actual time=0.828..3.801 rows=24 loops=1
Worker 1: actual time=0.333..4.385 rows=225 loops=1
-> Parallel Bitmap Heap Scan on bug_test.n2a
(cost=1113.32..12667.03 rows=14457 width=16) (actual time=1.187..5.693
rows=10193 loops=3)
Output: n2a.n_id, n2a.a_id
Recheck Cond: (n2a.a_id =
'00005831-4900-1200-0000-0000773ae45f'::uuid)
Heap Blocks: exact=2173
Worker 0: actual time=0.196..2.202 rows=246
loops=1
Worker 1: actual time=0.105..2.785 rows=1394
loops=1
-> Bitmap Index Scan on n2a_a_id_n_id_idx
(cost=0.00..1104.65 rows=34696 width=0) (actual time=2.861..2.861 rows=30578
loops=1)
Index Cond: (n2a.a_id =
'00005831-4900-1200-0000-0000773ae45f'::uuid)
-> Parallel Hash (cost=4967.71..4967.71 rows=40908
width=16) (actual time=19.568..19.569 rows=32727 loops=3)
Output: n2h.n_id
Buckets: 131072 Batches: 1 Memory Usage:
5632kB
Worker 0: actual time=0.077..0.077 rows=0
loops=1
Worker 1: actual time=0.050..0.050 rows=0
loops=1
-> Parallel Index Only Scan using n2h_pkey on
bug_test.n2h (cost=0.42..4967.71 rows=40908 width=16) (actual
time=0.024..32.132 rows=98180 loops=1)
Output: n2h.n_id
Heap Fetches: 98180
-> Index Only Scan using n_pkey on bug_test.n
(cost=0.42..0.68 rows=1 width=16) (actual time=0.004..0.004 rows=1
loops=8858)
Output: n.id
Index Cond: (n.id = n2h.n_id)
Heap Fetches: 8858
Worker 0: actual time=0.035..0.035 rows=1 loops=24
Worker 1: actual time=0.007..0.007 rows=1 loops=225
Planning Time: 0.755 ms
Execution Time: 155.490 ms
(40 rows)

Time: 156.952 ms
mnr=> select count(*), count(distinct id) from n where exists (select 1 from
n2a where n_id = n.id and a_id = '00005831-4900-1200-0000-0000773ae45f');
count | count
--------+--------
30,578 | 30,578
(1 row)

Time: 95.252 ms
mnr=> select count(*), count(distinct id) from n where exists (select 1 from
n2h where n_id = n.id);
count | count
--------+--------
61,325 | 61,325
(1 row)

Time: 574.642 ms
mnr=> select version();
version
------------------------------------------------------------
PostgreSQL 11.4, compiled by Visual C++ build 1914, 64-bit
(1 row)

Time: 0.305 ms
mnr=>

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Tom Lane 2019-07-23 16:50:40 Re: A function privilege problem
Previous Message Manuel Rigger 2019-07-23 12:01:56 DROP STATISTICS results in "ERROR: tuple concurrently updated"