Quick Links

RE: BUG #16031: Group by returns duplicate groups

From:	David Raymond <David(dot)Raymond(at)tomtom(dot)com>
To:	Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>
Cc:	"pgsql-bugs(at)lists(dot)postgresql(dot)org" <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject:	RE: BUG #16031: Group by returns duplicate groups
Date:	2019-10-07 18:29:04
Message-ID:	VI1PR07MB60298C48FF9982EACF0B3F39879B0@VI1PR07MB6029.eurprd07.prod.outlook.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-bugs

Downloaded and installed 12.0, created a nice shiny new cluster, and confirmed that it's still doing it. Now in 12 you have to force it to materialize the CTE, which was why I had used a CTE in 11 in the first place.

testing=> select version();
version
------------------------------------------------------------
PostgreSQL 12.0, compiled by Visual C++ build 1914, 64-bit
(1 row)

testing=> analyze verbose weird_grouping;
INFO: analyzing "name_stuff.weird_grouping"
INFO: "weird_grouping": scanned 8316 of 8316 pages, containing 1297265 live rows and 0 dead rows; 30000 rows in sample, 1297265 estimated total rows
ANALYZE
Time: 206.577 ms
testing=> select count(*), count(distinct name) from weird_grouping;
count | count
-----------+-----------
1,297,265 | 1,176,103
(1 row)

Time: 6729.011 ms (00:06.729)
testing=> with foo as (select name from weird_grouping group by name) select name from foo group by name having count(*) > 1;
name
------
(0 rows)

Time: 7289.128 ms (00:07.289)
testing=> explain (analyze, verbose, costs, settings, buffers, timing, summary) with foo as (select name from weird_grouping group by name) select name from foo group by name having count(*) > 1;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------
GroupAggregate (cost=179613.72..200268.04 rows=67 width=20) (actual time=6203.868..6203.868 rows=0 loops=1)
Output: weird_grouping.name
Group Key: weird_grouping.name
Filter: (count(*) > 1)
Rows Removed by Filter: 1176103
Buffers: shared hit=2464 read=5852, temp read=7800 written=7830
-> Group (cost=179613.72..186100.05 rows=944366 width=20) (actual time=4769.781..5985.111 rows=1176103 loops=1)
Output: weird_grouping.name
Group Key: weird_grouping.name
Buffers: shared hit=2464 read=5852, temp read=7800 written=7830
-> Sort (cost=179613.72..182856.89 rows=1297265 width=20) (actual time=4769.779..5844.350 rows=1297265 loops=1)
Output: weird_grouping.name
Sort Key: weird_grouping.name
Sort Method: external merge Disk: 39048kB
Buffers: shared hit=2464 read=5852, temp read=7800 written=7830
-> Seq Scan on name_stuff.weird_grouping (cost=0.00..21288.65 rows=1297265 width=20) (actual time=0.059..102.772 rows=1297265 loops=1)
Output: weird_grouping.name
Buffers: shared hit=2464 read=5852
Settings: search_path = 'name_stuff'
Planning Time: 0.048 ms
Execution Time: 7115.761 ms
(21 rows)

Time: 7116.170 ms (00:07.116)
testing=> with foo as materialized (select name from weird_grouping group by name) select name from foo group by name having count(*) > 1;
name
-------
DCT
DELTA
(2 rows)

Time: 8850.833 ms (00:08.851)
testing=> explain (analyze, verbose, costs, settings, buffers, timing, summary) with foo as materialized (select name from weird_grouping group by name) select name from foo group by name having count(*) > 1;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------
HashAggregate (cost=209709.20..209711.70 rows=67 width=516) (actual time=6676.811..6768.094 rows=2 loops=1)
Output: foo.name
Group Key: foo.name
Filter: (count(*) > 1)
Rows Removed by Filter: 1176099
Buffers: shared hit=2528 read=5788, temp read=7800 written=12363
CTE foo
-> Group (cost=179613.72..186100.05 rows=944366 width=20) (actual time=4774.681..6004.725 rows=1176103 loops=1)
Output: weird_grouping.name
Group Key: weird_grouping.name
Buffers: shared hit=2528 read=5788, temp read=7800 written=7830
-> Sort (cost=179613.72..182856.89 rows=1297265 width=20) (actual time=4774.678..5860.270 rows=1297265 loops=1)
Output: weird_grouping.name
Sort Key: weird_grouping.name
Sort Method: external merge Disk: 39048kB
Buffers: shared hit=2528 read=5788, temp read=7800 written=7830
-> Seq Scan on name_stuff.weird_grouping (cost=0.00..21288.65 rows=1297265 width=20) (actual time=0.065..101.141 rows=1297265 loops=1)
Output: weird_grouping.name
Buffers: shared hit=2528 read=5788
-> CTE Scan on foo (cost=0.00..18887.32 rows=944366 width=516) (actual time=4774.683..6228.002 rows=1176103 loops=1)
Output: foo.name
Buffers: shared hit=2528 read=5788, temp read=7800 written=12363
Settings: search_path = 'name_stuff'
Planning Time: 0.054 ms
Execution Time: 8786.597 ms
(25 rows)

Time: 8787.011 ms (00:08.787)
testing=>

-----Original Message-----
From: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>
Sent: Sunday, October 6, 2019 10:29 AM
To: David Raymond <David(dot)Raymond(at)tomtom(dot)com>
Cc: pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #16031: Group by returns duplicate groups

>>>>> "David" == David Raymond <David(dot)Raymond(at)tomtom(dot)com> writes:

David> As an update, I've currently got a dump that consistently shows
David> weirdness when loaded. It's just the "name" field, has 1.3
David> million records, is 15 MB zipped, and has things garbled enough
David> that I don't mind sending it.

David> How small does it need to be before it's good to send to
David> someone?

That's small enough for me, though since I don't use Windows all I'll be
able to do is check if you're exposing some general PG bug. If not I'll
see if I can find someone to test on Windows.

--
Andrew (irc:RhodiumToad)

In response to

Re: BUG #16031: Group by returns duplicate groups at 2019-10-06 14:29:09 from Andrew Gierth

Responses

Re: BUG #16031: Group by returns duplicate groups at 2019-10-07 20:53:41 from Tomas Vondra

Browse pgsql-bugs by date

	From	Date	Subject
Next Message	Andrew Gierth	2019-10-07 18:29:11	Re: BUG #16031: Group by returns duplicate groups
Previous Message	Rob Emery	2019-10-07 15:52:41	Re: [PATCH] Re: BUG #16032: pg_basebackup when running on Windows doesn't clean up on failure correctly