The issue of incorrect width estimation in UNION queries

From: sunw(dot)fnst <936739278(at)qq(dot)com>
To: pgsql-bugs <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject: The issue of incorrect width estimation in UNION queries
Date: 2025-09-24 06:13:32
Message-ID: tencent_34CF8017AB81944A4C08DD089D410AB6C306@qq.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Hi

When I perform a union operation on the varchar fields of two tables,
the planner incorrectly selects the sort method. But the efficiency of hashagg will be higher.
This seems to be because the planner has miscalculated the length of the columns.

The test cases are as follows:
(version is PostgreSQL 17.4)

CREATE TABLE public.t1 (
&nbsp; &nbsp; id integer primary key,
&nbsp; &nbsp; name varchar(10)
);

INSERT INTO public.t1 (id, name)
SELECT&nbsp; generate_series(1, 100000) AS id,'hgbnmjujty' AS name;
&nbsp; &nbsp;
CREATE TABLE public.t2 (
&nbsp; &nbsp; id integer primary key,
&nbsp; &nbsp; name varchar(10)
);

INSERT INTO public.t2 (id, name)
SELECT&nbsp; generate_series(1, 100000) AS id,'mnyknvkuma' AS name;

set work_mem to '256kB';
explain analyze select name from t1 union select name from t2;
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;QUERY PLAN

-----------------------------------------------------------------------------------------------------------------
------------
&nbsp;Unique&nbsp; (cost=38103.14..39103.14 rows=200000 width=38) (actual time=84.674..127.446 rows=2 loops=1)
&nbsp; &nbsp;-&gt;&nbsp; Sort&nbsp; (cost=38103.14..38603.14 rows=200000 width=38) (actual time=84.670..111.885 rows=200000 loops=1)
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;Sort Key: t1.name
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;Sort Method: external merge&nbsp; Disk: 2976kB
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;-&gt;&nbsp; Append&nbsp; (cost=0.00..4082.00 rows=200000 width=38) (actual time=0.016..40.349 rows=200000 loops=1)
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;-&gt;&nbsp; Seq Scan on t1&nbsp; (cost=0.00..1541.00 rows=100000 width=11) (actual time=0.015..11.187 rows=1000
00 loops=1)
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;-&gt;&nbsp; Seq Scan on t2&nbsp; (cost=0.00..1541.00 rows=100000 width=11) (actual time=0.011..11.098 rows=1000
00 loops=1)
&nbsp;Planning Time: 0.098 ms
&nbsp;Execution Time: 127.757 ms

set enable_sort to off;
explain analyze select name from t1 union select name from t2;
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; QUERY PLAN

-----------------------------------------------------------------------------------------------------------------
------
&nbsp;HashAggregate&nbsp; (cost=37582.00..45832.00 rows=200000 width=38) (actual time=69.529..69.550 rows=2 loops=1)
&nbsp; &nbsp;Group Key: t1.name
&nbsp; &nbsp;Planned Partitions: 16&nbsp; Batches: 1&nbsp; Memory Usage: 121kB
&nbsp; &nbsp;-&gt;&nbsp; Append&nbsp; (cost=0.00..4082.00 rows=200000 width=38) (actual time=0.016..39.050 rows=200000 loops=1)
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;-&gt;&nbsp; Seq Scan on t1&nbsp; (cost=0.00..1541.00 rows=100000 width=11) (actual time=0.015..10.056 rows=100000 loo
ps=1)
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;-&gt;&nbsp; Seq Scan on t2&nbsp; (cost=0.00..1541.00 rows=100000 width=11) (actual time=0.018..10.654 rows=100000 loo
ps=1)
&nbsp;Planning Time: 0.105 ms
&nbsp;Execution Time: 69.606 ms

In the execution plan, the width of the name field is estimated to be 38,
but the name fields in both tables are strings with a length of 10.
It seems that this is the reason why the optimizer incorrectly estimated the data size.

Regards

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Amit Kapila 2025-09-24 08:34:12 Re: Lock timeouts and unusual spikes in replication lag with logical parallel transaction streaming
Previous Message David Rowley 2025-09-24 05:21:11 Re: BUG #19061: I downloaded version 17 of postgreSQL and it was lagging very much