Quick Links

BUG #16031: Group by returns duplicate groups

From:	PG Bug reporting form <noreply(at)postgresql(dot)org>
To:	pgsql-bugs(at)lists(dot)postgresql(dot)org
Cc:	david(dot)raymond(at)tomtom(dot)com
Subject:	BUG #16031: Group by returns duplicate groups
Date:	2019-09-30 18:12:27
Message-ID:	16031-4ef55395a5fbb687@postgresql.org
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-bugs

The following bug has been logged on the website:

Bug reference: 16031
Logged by: David Raymond
Email address: david(dot)raymond(at)tomtom(dot)com
PostgreSQL version: 11.5
Operating system: Windows 10
Description:

I have a large table which I'm running a query on, grouping on a single
field, and returning only that 1 field, filtered with a HAVING clause. The
results that I'm getting back however contain a duplicate entry for one of
the returned values, which I believe should never be able to happen.
I'm working on dropping as many extra fields and records as I can to still
get the bad result from something small enough to send as a test case, but
figured I'd post this while I'm at it to get any advice. I've done a dump of
the table and then restored it to a new table, and still get the same weird
results. On the original version of the table I had clustered it on the
index that starts with the field being grouped on, but it's still giving the
bad groups after a dump and restore.
I'm running 11.5 on Windows 10 (Enterprise DB installer) and unfortunately
don't have the resources to build a new version myself if you come up with a
patch (also why I'm hoping to shrink it down to where others can test it)

Here's output from psql:

testing=> select version();
version
------------------------------------------------------------
PostgreSQL 11.5, compiled by Visual C++ build 1914, 64-bit
(1 row)

Time: 0.272 ms

testing=> select count(*) from big_table;
count
-------------
108,565,086
(1 row)

Time: 273770.205 ms (04:33.770)

testing=> explain (analyze, verbose, costs, buffers, timing, summary) create
table bad_groups_1 as select name from big_table group by name having
count(distinct array[arr_field_1, arr_field_2, arr_field_3, arr_field_4,
arr_field_5, arr_field_6]) > 1;

QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
GroupAggregate (cost=29317810.81..30149735.19 rows=487417 width=22)
(actual time=758501.326..895894.082 rows=745024 loops=1)
Output: name
Group Key: big_table.name
Filter: (count(DISTINCT ARRAY[big_table.arr_field_1,
big_table.arr_field_2, big_table.arr_field_3, big_table.arr_field_4,
big_table.arr_field_5, big_table.arr_field_6]) > 1)
Rows Removed by Filter: 80610652
Buffers: shared hit=2325 read=1515081, temp read=2464481
written=2467410
-> Sort (cost=29317810.81..29589026.23 rows=108486168 width=57) (actual
time=758493.476..819035.136 rows=108565086 loops=1)
Output: name, arr_field_1, arr_field_2, arr_field_3, arr_field_4,
arr_field_5, arr_field_6
Sort Key: big_table.name
Sort Method: external merge Disk: 4174488kB
Buffers: shared hit=2317 read=1515081, temp read=2464481
written=2467410
-> Seq Scan on name_stuff.big_table (cost=0.00..2602259.68
rows=108486168 width=57) (actual time=23.216..119113.708 rows=108565086
loops=1)
Output: name, arr_field_1, arr_field_2, arr_field_3,
arr_field_4, arr_field_5, arr_field_6
Buffers: shared hit=2317 read=1515081
Planning Time: 0.196 ms
Execution Time: 897276.109 ms
(16 rows)

Time: 897285.808 ms (14:57.286)

testing=> select count(*), count(distinct name) from bad_groups_1;
count | count
---------+---------
745,024 | 745,023
(1 row)

Time: 899.273 ms

testing=> select name from bad_groups_1 group by name having count(*) > 1;
name
------
DK
(1 row)

Time: 337.663 ms

testing=>

Responses

Re: BUG #16031: Group by returns duplicate groups at 2019-09-30 18:17:14 from Tom Lane

Browse pgsql-bugs by date

	From	Date	Subject
Next Message	Tom Lane	2019-09-30 18:17:14	Re: BUG #16031: Group by returns duplicate groups
Previous Message	PG Bug reporting form	2019-09-30 14:05:01	BUG #16030: yum update failed due to download the https://download.postgresql.org/pub/repos/yum/12/redhat/rhel-7