Re: BUG #16031: Group by returns duplicate groups

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: David Raymond <David(dot)Raymond(at)tomtom(dot)com>
Cc: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>, "pgsql-bugs(at)lists(dot)postgresql(dot)org" <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject: Re: BUG #16031: Group by returns duplicate groups
Date: 2019-10-07 20:53:41
Message-ID: 20191007205341.koxjl5etfoqg4y7l@development
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Hi,

On Mon, Oct 07, 2019 at 06:29:04PM +0000, David Raymond wrote:
>Downloaded and installed 12.0, created a nice shiny new cluster, and
>confirmed that it's still doing it. Now in 12 you have to force it to
>materialize the CTE, which was why I had used a CTE in 11 in the first
>place.
>

I've tried running this on a 12.0 cluster too (on Linux though), using
the data set linked by Andrew, and I haven't observed any duplicate rows
either. I do have some observations, though ...

>
>testing=> select version();
> version
>------------------------------------------------------------
> PostgreSQL 12.0, compiled by Visual C++ build 1914, 64-bit
>(1 row)
>
>Time: 0.148 ms
>testing=> \d+ weird_grouping
> Table "name_stuff.weird_grouping"
> Column | Type | Collation | Nullable | Default | Storage | Stats target | Description
>--------+------------------------+-----------+----------+---------+----------+--------------+-------------
> name | character varying(254) | | not null | | extended | |
>Access method: heap
>
>testing=> analyze verbose weird_grouping;
>INFO: analyzing "name_stuff.weird_grouping"
>INFO: "weird_grouping": scanned 8316 of 8316 pages, containing 1297265 live rows and 0 dead rows; 30000 rows in sample, 1297265 estimated total rows
>ANALYZE
>Time: 206.577 ms
>testing=> select count(*), count(distinct name) from weird_grouping;
> count | count
>-----------+-----------
> 1,297,265 | 1,176,103
>(1 row)
>

I do get this:

count | count
---------+---------
1297265 | 1176101
(1 row)

>Time: 6729.011 ms (00:06.729)
>testing=> with foo as (select name from weird_grouping group by name) select name from foo group by name having count(*) > 1;
> name
>------
>(0 rows)
>
>Time: 7289.128 ms (00:07.289)
>testing=> explain (analyze, verbose, costs, settings, buffers, timing, summary) with foo as (select name from weird_grouping group by name) select name from foo group by name having count(*) > 1;
> QUERY PLAN
>--------------------------------------------------------------------------------------------------------------------------------------------------------
> GroupAggregate (cost=179613.72..200268.04 rows=67 width=20) (actual time=6203.868..6203.868 rows=0 loops=1)
> Output: weird_grouping.name
> Group Key: weird_grouping.name
> Filter: (count(*) > 1)
> Rows Removed by Filter: 1176103
> Buffers: shared hit=2464 read=5852, temp read=7800 written=7830
> -> Group (cost=179613.72..186100.05 rows=944366 width=20) (actual time=4769.781..5985.111 rows=1176103 loops=1)
> Output: weird_grouping.name
> Group Key: weird_grouping.name
> Buffers: shared hit=2464 read=5852, temp read=7800 written=7830
> -> Sort (cost=179613.72..182856.89 rows=1297265 width=20) (actual time=4769.779..5844.350 rows=1297265 loops=1)
> Output: weird_grouping.name
> Sort Key: weird_grouping.name
> Sort Method: external merge Disk: 39048kB
> Buffers: shared hit=2464 read=5852, temp read=7800 written=7830
> -> Seq Scan on name_stuff.weird_grouping (cost=0.00..21288.65 rows=1297265 width=20) (actual time=0.059..102.772 rows=1297265 loops=1)
> Output: weird_grouping.name
> Buffers: shared hit=2464 read=5852
> Settings: search_path = 'name_stuff'
> Planning Time: 0.048 ms
> Execution Time: 7115.761 ms
>(21 rows)
>

I'd like to point this may not be actually correct either. It does not
produce any rows, i.e. it does not see any "duplicate groups", but it
does this:

-> Group (cost=179613.72..186100.05 rows=944366 width=20) (actual time=4769.781..5985.111 rows=1176103 loops=1)

Notice that it has 1176103 groups, but is that actually correct? On my
machine I only see 1176101 groups, and the difference (2) would match
the DCL and DELTA duplicates with the next query.

>Time: 7116.170 ms (00:07.116)
>testing=> with foo as materialized (select name from weird_grouping group by name) select name from foo group by name having count(*) > 1;
> name
>-------
> DCT
> DELTA
>(2 rows)
>
>Time: 8850.833 ms (00:08.851)
>testing=> explain (analyze, verbose, costs, settings, buffers, timing, summary) with foo as materialized (select name from weird_grouping group by name) select name from foo group by name having count(*) > 1;
> QUERY PLAN
>----------------------------------------------------------------------------------------------------------------------------------------------------------
> HashAggregate (cost=209709.20..209711.70 rows=67 width=516) (actual time=6676.811..6768.094 rows=2 loops=1)
> Output: foo.name
> Group Key: foo.name
> Filter: (count(*) > 1)
> Rows Removed by Filter: 1176099
> Buffers: shared hit=2528 read=5788, temp read=7800 written=12363
> CTE foo
> -> Group (cost=179613.72..186100.05 rows=944366 width=20) (actual time=4774.681..6004.725 rows=1176103 loops=1)
> Output: weird_grouping.name
> Group Key: weird_grouping.name
> Buffers: shared hit=2528 read=5788, temp read=7800 written=7830
> -> Sort (cost=179613.72..182856.89 rows=1297265 width=20) (actual time=4774.678..5860.270 rows=1297265 loops=1)
> Output: weird_grouping.name
> Sort Key: weird_grouping.name
> Sort Method: external merge Disk: 39048kB
> Buffers: shared hit=2528 read=5788, temp read=7800 written=7830
> -> Seq Scan on name_stuff.weird_grouping (cost=0.00..21288.65 rows=1297265 width=20) (actual time=0.065..101.141 rows=1297265 loops=1)
> Output: weird_grouping.name
> Buffers: shared hit=2528 read=5788
> -> CTE Scan on foo (cost=0.00..18887.32 rows=944366 width=516) (actual time=4774.683..6228.002 rows=1176103 loops=1)
> Output: foo.name
> Buffers: shared hit=2528 read=5788, temp read=7800 written=12363
> Settings: search_path = 'name_stuff'
> Planning Time: 0.054 ms
> Execution Time: 8786.597 ms
>(25 rows)
>
>Time: 8787.011 ms (00:08.787)
>testing=>
>

Can you try running this with enable_hashagg = off? That should give you
another Sort on the CTE Scan, and a GroupAggregate at the top. I wonder
if that makes the issue go away ...

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Michael Paquier 2019-10-08 00:32:40 Re: BUG #16039: PANIC when activating replication slots in Postgres 12.0 64bit under Windows
Previous Message Tomas Vondra 2019-10-07 19:58:36 Re: BUG #16043: Duplicate Entry Key Message on postgres