RE: BUG #16031: Group by returns duplicate groups

From: David Raymond <David(dot)Raymond(at)tomtom(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>, "pgsql-bugs(at)lists(dot)postgresql(dot)org" <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject: RE: BUG #16031: Group by returns duplicate groups
Date: 2019-10-08 13:36:50
Message-ID: VI1PR07MB6029587F019D6F4735411CC5879A0@VI1PR07MB6029.eurprd07.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

" Can you try running this with enable_hashagg = off? That should give you another Sort on the CTE Scan, and a GroupAggregate at the top. I wonder if that makes the issue go away ..."

Here's what I get for that. And extra weirdness below:

testing=> set enable_hashagg = off;
SET
Time: 0.241 ms
testing=> with foo as materialized (select name from weird_grouping group by name) select name from foo group by name having count(*) > 1;
name
------
(0 rows)

Time: 10423.486 ms (00:10.423)
testing=> explain (analyze, verbose, costs, settings, buffers, timing, summary) with foo as materialized (select name from weird_grouping group by name) select name from foo group by name having count(*) > 1;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------
GroupAggregate (cost=737694.91..744780.15 rows=67 width=516) (actual time=7650.411..7650.411 rows=0 loops=1)
Output: foo.name
Group Key: foo.name
Filter: (count(*) > 1)
Rows Removed by Filter: 1176103
Buffers: shared hit=160 read=8156, temp read=14645 written=19235
CTE foo
-> Group (cost=179613.72..186100.05 rows=944366 width=20) (actual time=4811.449..6027.355 rows=1176103 loops=1)
Output: weird_grouping.name
Group Key: weird_grouping.name
Buffers: shared hit=160 read=8156, temp read=7800 written=7830
-> Sort (cost=179613.72..182856.89 rows=1297265 width=20) (actual time=4811.447..5884.667 rows=1297265 loops=1)
Output: weird_grouping.name
Sort Key: weird_grouping.name
Sort Method: external merge Disk: 39048kB
Buffers: shared hit=160 read=8156, temp read=7800 written=7830
-> Seq Scan on name_stuff.weird_grouping (cost=0.00..21288.65 rows=1297265 width=20) (actual time=0.058..117.833 rows=1297265 loops=1)
Output: weird_grouping.name
Buffers: shared hit=160 read=8156
-> Sort (cost=551594.86..553955.78 rows=944366 width=516) (actual time=6915.562..7418.978 rows=1176103 loops=1)
Output: foo.name
Sort Key: foo.name
Sort Method: external merge Disk: 36368kB
Buffers: shared hit=160 read=8156, temp read=14645 written=19235
-> CTE Scan on foo (cost=0.00..18887.32 rows=944366 width=516) (actual time=4811.451..6243.160 rows=1176103 loops=1)
Output: foo.name
Buffers: shared hit=160 read=8156, temp read=7800 written=12363
Settings: enable_hashagg = 'off', search_path = 'name_stuff'
Planning Time: 0.064 ms
Execution Time: 10175.478 ms
(30 rows)

Time: 10175.906 ms (00:10.176)
testing=>

But now here's another weird bit:

testing=> select count(*), count(distinct name) from weird_grouping;
count | count
-----------+-----------
1,297,265 | 1,176,103
(1 row)

Time: 6866.369 ms (00:06.866)
testing=> explain (analyze, verbose, costs, settings, buffers, timing, summary) select count(*), count(distinct name) from weird_grouping;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=27774.98..27774.99 rows=1 width=16) (actual time=6642.856..6642.856 rows=1 loops=1)
Output: count(*), count(DISTINCT name)
Buffers: shared hit=928 read=7388, temp read=5484 written=5506
-> Seq Scan on name_stuff.weird_grouping (cost=0.00..21288.65 rows=1297265 width=20) (actual time=0.072..110.798 rows=1297265 loops=1)
Output: name
Buffers: shared hit=928 read=7388
Settings: search_path = 'name_stuff'
Planning Time: 0.030 ms
Execution Time: 6642.875 ms
(9 rows)

Time: 6643.181 ms (00:06.643)
testing=> select count(*), count(distinct name collate "C") from weird_grouping;
count | count
-----------+-----------
1,297,265 | 1,176,101
(1 row)

Time: 1655.202 ms (00:01.655)
testing=> explain (analyze, verbose, costs, settings, buffers, timing, summary) select count(*), count(distinct name collate "C") from weird_grouping;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=27774.98..27774.99 rows=1 width=16) (actual time=1788.276..1788.277 rows=1 loops=1)
Output: count(*), count(DISTINCT (name)::character varying(254))
Buffers: shared hit=992 read=7324, temp read=5484 written=5506
-> Seq Scan on name_stuff.weird_grouping (cost=0.00..21288.65 rows=1297265 width=20) (actual time=0.059..112.815 rows=1297265 loops=1)
Output: name
Buffers: shared hit=992 read=7324
Settings: search_path = 'name_stuff'
Planning Time: 0.030 ms
Execution Time: 1788.295 ms
(9 rows)

Time: 1788.596 ms (00:01.789)
testing=> show lc_collate;
lc_collate
------------
en-US
(1 row)

Time: 0.122 ms
testing=> show server_encoding;
server_encoding
-----------------
UTF8
(1 row)

Time: 0.082 ms
testing=> select count(*), count(distinct name collate "en-US") from weird_grouping;
ERROR: collation "en-US" for encoding "UTF8" does not exist
LINE 1: select count(*), count(distinct name collate "en-US") from w...
^
Time: 5.759 ms
testing=>

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message Tom Lane 2019-10-08 15:38:44 Re: Potential to_date(string, string) function malfunction
Previous Message Pavel Pleva 2019-10-08 12:30:56 Potential to_date(string, string) function malfunction