Quick Links

Underestimated number of output rows with an aggregate function

From:	Philippe BEAUDOIN <phb(dot)emaj(at)free(dot)fr>
To:	"pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject:	Underestimated number of output rows with an aggregate function
Date:	2023-10-15 08:38:22
Message-ID:	cee9a6fc-f378-457e-8fec-f933f25f22b5@free.fr
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-performance

Hi all,

Working on the emaj extension (for the curious ones,
https://emaj.readthedocs.io/en/latest/ and
https://github.com/dalibo/emaj) I recently faced a performance problem
when querying and aggregating data changes. A query with 3 CTE has a O^2
behavior (https://explain.dalibo.com/plan/1ded242d4ebf3gch#plan) I have
found a workaround by setting enable_nestloop to FALSE. But this has
drawbacks. So I want to better understand the issue.

During my analysis, I realized that the output rows estimate of the
second CTE is really bad, leading to a bad plan for the next CTE.

I reproduced the issue in a very small test case with a simplified
query. Attached is a shell script and its output.

A simple table is created, filled and analyzed.

The simplified statement is:
WITH keys AS (
   SELECT c1, min(seq) AS seq FROM perf GROUP BY c1
   )
SELECT tbl.*
   FROM perf tbl JOIN keys ON (keys.c1 = tbl.c1 AND keys.seq = tbl.seq);

Its plan is:
Hash Join (cost=958.00..1569.00 rows=1 width=262) (actual
time=18.516..30.702 rows=10000 loops=1)
   Output: tbl.c1, tbl.seq, tbl.c2
   Inner Unique: true
   Hash Cond: ((tbl.c1 = perf.c1) AND (tbl.seq = (min(perf.seq))))
   Buffers: shared hit=856
   -> Seq Scan on public.perf tbl (cost=0.00..548.00 rows=12000
width=262) (actual time=0.007..2.323 rows=12000 loops=1)
         Output: tbl.c1, tbl.seq, tbl.c2
         Buffers: shared hit=428
   -> Hash (cost=808.00..808.00 rows=10000 width=8) (actual
time=18.480..18.484 rows=10000 loops=1)
         Output: perf.c1, (min(perf.seq))
         Buckets: 16384 Batches: 1 Memory Usage: 519kB
         Buffers: shared hit=428
         -> HashAggregate (cost=608.00..708.00 rows=10000 width=8)
(actual time=10.688..14.321 rows=10000 loops=1)
               Output: perf.c1, min(perf.seq)
               Group Key: perf.c1
               Batches: 1 Memory Usage: 1425kB
               Buffers: shared hit=428
               -> Seq Scan on public.perf (cost=0.00..548.00
rows=12000 width=8) (actual time=0.002..2.330 rows=12000 loops=1)
                     Output: perf.c1, perf.seq, perf.c2
                     Buffers: shared hit=428

It globally looks good to me, with 2 sequential scans and a hash join.
But the number of returned rows estimate is always 1, while it actually
depends on the data content (here 10000).

For the hash join node, the plan shows a "Inner Unique: true" property.
I wonder if this is normal. It look likes the optimizer doesn't take
into account the presence of the GROUP BY clause in its estimate.

I reproduce the case with all supported postgres versions.

Thanks by advance for any explanation.
Philippe.

Attachment	Content-Type	Size
test_case.sh	application/x-shellscript	1.0 KB
test_case.log	text/x-log	2.8 KB

Responses

Re: Underestimated number of output rows with an aggregate function at 2023-10-15 16:37:29 from Tom Lane

Browse pgsql-performance by date

	From	Date	Subject
Next Message	Tom Lane	2023-10-15 16:37:29	Re: Underestimated number of output rows with an aggregate function
Previous Message	Tom Lane	2023-10-13 20:26:25	Re: Postgres 15 SELECT query doesn't use index under RLS