Wrapping a select in another select makes it slower

From: "Peter J(dot) Holzer" <hjp-pgsql(at)hjp(dot)at>
To: pgsql-general(at)postgresql(dot)org
Subject: Wrapping a select in another select makes it slower
Date: 2025-07-22 15:17:35
Message-ID: 20250722151735.toridqqnb3krhimp@hjp.at
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-general

PostgreSQL version 17.5 on Ubuntu 24.04 (PGDG build).

Context: We use CheckMK to monitor our systems. This also reports
various health indicators for Postgres. Recently a colleague noticed
that the indicators for one database were missing, presumable since
upgrading to Ubuntu 24.04 and Postgres 17. Closer investigation showed
that the plugin was still running but took a long time (almost 4
minutes) to return a result (which caused a timeout).

So I checked which of the many queries was slow and found the culprit.
I'm not including the whole query here (it's a mess of of many nested
subqueries) only the interesting part.

This query returns a result quickly (less than 0.5 seconds):

SELECT
ns.nspname, tbl.relname, hdr, ma, bs,
SUM((1-coalesce(null_frac,0))*coalesce(avg_width, 2048)) AS datawidth,
MAX(coalesce(null_frac,0)) AS maxfracsum,
hdr+(
SELECT 1+count(*)/8
FROM pg_stats s2
WHERE null_frac<>0 AND s2.schemaname = ns.nspname AND s2.tablename = tbl.relname
) AS nullhdr
FROM pg_attribute att
JOIN pg_class tbl ON att.attrelid = tbl.oid
JOIN pg_namespace ns ON ns.oid = tbl.relnamespace
LEFT JOIN pg_stats s ON s.schemaname=ns.nspname AND s.tablename = tbl.relname AND s.inherited=false AND s.attname=att.attname,
(
SELECT
( SELECT current_setting('block_size')::numeric) AS bs,
CASE WHEN SUBSTRING(SPLIT_PART(v, ' ', 2) FROM '#\[0-9]+.[0-9]+#\%' for '#') IN ('8.0','8.1','8.2') THEN 27 ELSE 23 END AS hdr,
CASE WHEN v ~ 'mingw32' OR v ~ '64-bit' THEN 8 ELSE 4 END AS ma
FROM (
SELECT version() AS v
) AS foo
) AS constants
WHERE att.attnum > 0 AND tbl.relkind='r'
GROUP BY 1,2,3,4,5
;

This is wrapped in a very simple select:

SELECT ma,bs,foo.nspname,foo.relname,
(datawidth+(hdr+ma-(case when hdr%ma=0 THEN ma ELSE hdr%ma END)))::numeric AS datahdr,
(maxfracsum*(nullhdr+ma-(case when nullhdr%ma=0 THEN ma ELSE nullhdr%ma END))) AS nullhdr2
FROM (
-- the query above
) AS foo;

And now it takes 4 minutes.

Rewriting it as a normal CTE doesn't change that:

with foo as (
-- the query above
)
SELECT ... FROM foo;

But a materialized CTE

with foo as materialized (
-- the query above
)
SELECT ... FROM foo;

is fast again.

Here are the execution plans of the first two queries (from explain (analyze)):

GroupAggregate (cost=7048.46..142133.79 rows=2500 width=188) (actual time=405.135..550.535 rows=625 loops=1)
Group Key: tbl.relname, ns.nspname
InitPlan 1
-> Result (cost=0.00..0.02 rows=1 width=32) (actual time=0.017..0.017 rows=1 loops=1)
-> Merge Left Join (cost=7048.44..7133.41 rows=8357 width=136) (actual time=404.807..434.347 rows=40460 loops=1)
Merge Cond: ((tbl.relname = s.tablename) AND (ns.nspname = s.schemaname) AND (att.attname = s.attname))
-> Sort (cost=2571.61..2592.50 rows=8357 width=192) (actual time=111.335..120.020 rows=40460 loops=1)
Sort Key: tbl.relname, ns.nspname, att.attname
Sort Method: external merge Disk: 8008kB
-> Hash Join (cost=137.38..2027.20 rows=8357 width=192) (actual time=57.024..75.192 rows=40460 loops=1)
Hash Cond: (tbl.relnamespace = ns.oid)
-> Hash Join (cost=136.29..1980.93 rows=8357 width=132) (actual time=0.617..13.882 rows=40460 loops=1)
Hash Cond: (att.attrelid = tbl.oid)
-> Seq Scan on pg_attribute att (cost=0.00..1719.62 rows=47572 width=68) (actual time=0.026..6.804 rows=47512 loops=1)
Filter: (attnum > 0)
Rows Removed by Filter: 10218
-> Hash (cost=128.47..128.47 rows=625 width=72) (actual time=0.564..0.565 rows=625 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 72kB
-> Seq Scan on pg_class tbl (cost=0.00..128.47 rows=625 width=72) (actual time=0.016..0.485 rows=625 loops=1)
Filter: (relkind = 'r'::"char")
Rows Removed by Filter: 2933
-> Hash (cost=1.04..1.04 rows=4 width=68) (actual time=56.360..56.360 rows=4 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 9kB
-> Seq Scan on pg_namespace ns (cost=0.00..1.04 rows=4 width=68) (actual time=56.319..56.325 rows=4 loops=1)
-> Sort (cost=4476.84..4477.18 rows=139 width=200) (actual time=293.367..300.227 rows=30921 loops=1)
Sort Key: s.tablename, s.schemaname, s.attname
Sort Method: external sort Disk: 6480kB
-> Subquery Scan on s (cost=2189.59..4471.89 rows=139 width=200) (actual time=207.797..253.934 rows=30921 loops=1)
-> Hash Left Join (cost=2189.59..4470.50 rows=139 width=469) (actual time=207.791..251.676 rows=30921 loops=1)
Hash Cond: (c.relnamespace = n.oid)
-> Hash Join (cost=2188.50..4468.65 rows=139 width=140) (actual time=207.723..246.208 rows=30921 loops=1)
Hash Cond: ((s_1.starelid = c.oid) AND (s_1.staattnum = a.attnum))
-> Seq Scan on pg_statistic s_1 (cost=0.00..2008.21 rows=15460 width=14) (actual time=0.024..6.845 rows=30921 loops=1)
Filter: (NOT stainherit)
-> Hash (cost=1899.86..1899.86 rows=19243 width=142) (actual time=207.320..207.323 rows=57730 loops=1)
Buckets: 65536 (originally 32768) Batches: 2 (originally 1) Memory Usage: 7681kB
-> Hash Join (cost=172.95..1899.86 rows=19243 width=142) (actual time=1.129..176.772 rows=57730 loops=1)
Hash Cond: (a.attrelid = c.oid)
Join Filter: has_column_privilege(c.oid, a.attnum, 'select'::text)
-> Seq Scan on pg_attribute a (cost=0.00..1575.30 rows=57730 width=70) (actual time=0.013..9.457 rows=57730 loops=1)
Filter: (NOT attisdropped)
-> Hash (cost=128.47..128.47 rows=3558 width=72) (actual time=0.993..0.994 rows=3558 loops=1)
Buckets: 4096 Batches: 1 Memory Usage: 394kB
-> Seq Scan on pg_class c (cost=0.00..128.47 rows=3558 width=72) (actual time=0.021..0.558 rows=3558 loops=1)
Filter: ((NOT relrowsecurity) OR (NOT row_security_active(oid)))
-> Hash (cost=1.04..1.04 rows=4 width=68) (actual time=0.023..0.024 rows=4 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 9kB
-> Seq Scan on pg_namespace n (cost=0.00..1.04 rows=4 width=68) (actual time=0.015..0.017 rows=4 loops=1)
SubPlan 2
-> Aggregate (cost=53.88..53.90 rows=1 width=8) (actual time=0.174..0.175 rows=1 loops=625)
-> Nested Loop (cost=0.86..53.87 rows=1 width=469) (actual time=0.058..0.172 rows=39 loops=625)
Join Filter: (s_2.starelid = c_1.oid)
-> Nested Loop (cost=0.57..53.42 rows=1 width=10) (actual time=0.011..0.076 rows=71 loops=625)
-> Nested Loop (cost=0.28..9.36 rows=1 width=4) (actual time=0.006..0.006 rows=1 loops=625)
Join Filter: (c_1.relnamespace = n_1.oid)
-> Index Scan using pg_class_relname_nsp_index on pg_class c_1 (cost=0.28..8.30 rows=1 width=8) (actual time=0.002..0.002 rows=1 loops=625)
Index Cond: (relname = tbl.relname)
Filter: ((NOT relrowsecurity) OR (NOT row_security_active(oid)))
-> Seq Scan on pg_namespace n_1 (cost=0.00..1.05 rows=1 width=4) (actual time=0.001..0.001 rows=1 loops=625)
Filter: (nspname = ns.nspname)
Rows Removed by Filter: 3
-> Index Scan using pg_attribute_relid_attnum_index on pg_attribute a_1 (cost=0.29..43.98 rows=7 width=6) (actual time=0.005..0.064 rows=71 loops=625)
Index Cond: (attrelid = c_1.oid)
Filter: ((NOT attisdropped) AND has_column_privilege(c_1.oid, attnum, 'select'::text))
-> Index Scan using pg_statistic_relid_att_inh_index on pg_statistic s_2 (cost=0.29..0.44 rows=1 width=6) (actual time=0.001..0.001 rows=1 loops=44210)
Index Cond: ((starelid = a_1.attrelid) AND (staattnum = a_1.attnum))
Filter: (stanullfrac <> '0'::double precision)
Rows Removed by Filter: 0
Planning Time: 9.752 ms
JIT:
Functions: 115
Options: Inlining false, Optimization false, Expressions true, Deforming true
Timing: Generation 6.386 ms (Deform 3.235 ms), Inlining 0.000 ms, Optimization 2.815 ms, Emission 53.702 ms, Total 62.903 ms
Execution Time: 592.748 ms

Subquery Scan on foo (cost=5370.24..142883.56 rows=2500 width=204) (actual time=256.040..292421.584 rows=625 loops=1)
-> GroupAggregate (cost=5370.24..142746.06 rows=2500 width=188) (actual time=256.022..292420.807 rows=625 loops=1)
Group Key: ns.nspname, tbl.relname
InitPlan 1
-> Result (cost=0.00..0.02 rows=1 width=32) (actual time=0.014..0.015 rows=1 loops=1)
-> Incremental Sort (cost=5370.23..7745.68 rows=8357 width=136) (actual time=255.739..292322.422 rows=40460 loops=1)
Sort Key: ns.nspname, tbl.relname
Presorted Key: ns.nspname
Full-sort Groups: 2 Sort Method: quicksort Average Memory: 34kB Peak Memory: 34kB
Pre-sorted Groups: 3 Sort Methods: quicksort, external merge Average Memory: 38kB Peak Memory: 88kB Average Disk: 1874kB Peak Disk: 5624kB
-> Merge Left Join (cost=4613.25..7180.30 rows=8357 width=136) (actual time=222.037..292242.701 rows=40460 loops=1)
Merge Cond: (ns.nspname = s.schemaname)
Join Filter: ((s.tablename = tbl.relname) AND (s.attname = att.attname))
Rows Removed by Join Filter: 1215045276
-> Nested Loop (cost=136.42..2515.44 rows=8357 width=192) (actual time=65.105..188.427 rows=40460 loops=1)
Join Filter: (tbl.relnamespace = ns.oid)
Rows Removed by Join Filter: 121380
-> Index Scan using pg_namespace_nspname_index on pg_namespace ns (cost=0.13..12.19 rows=4 width=68) (actual time=0.021..0.039 rows=4 loops=1)
-> Materialize (cost=136.29..2022.72 rows=8357 width=132) (actual time=16.114..36.529 rows=40460 loops=4)
-> Hash Join (cost=136.29..1980.93 rows=8357 width=132) (actual time=64.422..76.465 rows=40460 loops=1)
Hash Cond: (att.attrelid = tbl.oid)
-> Seq Scan on pg_attribute att (cost=0.00..1719.62 rows=47572 width=68) (actual time=63.875..69.458 rows=47512 loops=1)
Filter: (attnum > 0)
Rows Removed by Filter: 10218
-> Hash (cost=128.47..128.47 rows=625 width=72) (actual time=0.490..0.491 rows=625 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 72kB
-> Seq Scan on pg_class tbl (cost=0.00..128.47 rows=625 width=72) (actual time=0.038..0.410 rows=625 loops=1)
Filter: (relkind = 'r'::"char")
Rows Removed by Filter: 2933
-> Sort (cost=4476.84..4477.18 rows=139 width=200) (actual time=156.864..178628.030 rows=1215036357 loops=1)
Sort Key: s.schemaname
Sort Method: external sort Disk: 6480kB
-> Subquery Scan on s (cost=2189.59..4471.89 rows=139 width=200) (actual time=81.840..131.044 rows=30921 loops=1)
-> Hash Left Join (cost=2189.59..4470.50 rows=139 width=469) (actual time=81.835..128.788 rows=30921 loops=1)
Hash Cond: (c.relnamespace = n.oid)
-> Hash Join (cost=2188.50..4468.65 rows=139 width=140) (actual time=81.787..123.140 rows=30921 loops=1)
Hash Cond: ((s_1.starelid = c.oid) AND (s_1.staattnum = a.attnum))
-> Seq Scan on pg_statistic s_1 (cost=0.00..2008.21 rows=15460 width=14) (actual time=0.014..5.069 rows=30921 loops=1)
Filter: (NOT stainherit)
-> Hash (cost=1899.86..1899.86 rows=19243 width=142) (actual time=81.660..81.661 rows=57730 loops=1)
Buckets: 65536 (originally 32768) Batches: 2 (originally 1) Memory Usage: 7681kB
-> Hash Join (cost=172.95..1899.86 rows=19243 width=142) (actual time=1.119..59.096 rows=57730 loops=1)
Hash Cond: (a.attrelid = c.oid)
Join Filter: has_column_privilege(c.oid, a.attnum, 'select'::text)
-> Seq Scan on pg_attribute a (cost=0.00..1575.30 rows=57730 width=70) (actual time=0.010..7.991 rows=57730 loops=1)
Filter: (NOT attisdropped)
-> Hash (cost=128.47..128.47 rows=3558 width=72) (actual time=1.062..1.063 rows=3558 loops=1)
Buckets: 4096 Batches: 1 Memory Usage: 394kB
-> Seq Scan on pg_class c (cost=0.00..128.47 rows=3558 width=72) (actual time=0.065..0.636 rows=3558 loops=1)
Filter: ((NOT relrowsecurity) OR (NOT row_security_active(oid)))
-> Hash (cost=1.04..1.04 rows=4 width=68) (actual time=0.025..0.026 rows=4 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 9kB
-> Seq Scan on pg_namespace n (cost=0.00..1.04 rows=4 width=68) (actual time=0.016..0.018 rows=4 loops=1)
SubPlan 2
-> Aggregate (cost=53.88..53.90 rows=1 width=8) (actual time=0.146..0.146 rows=1 loops=625)
-> Nested Loop (cost=0.86..53.87 rows=1 width=469) (actual time=0.048..0.144 rows=39 loops=625)
Join Filter: (s_2.starelid = c_1.oid)
-> Nested Loop (cost=0.57..53.42 rows=1 width=10) (actual time=0.009..0.060 rows=71 loops=625)
-> Nested Loop (cost=0.28..9.36 rows=1 width=4) (actual time=0.005..0.005 rows=1 loops=625)
Join Filter: (c_1.relnamespace = n_1.oid)
-> Index Scan using pg_class_relname_nsp_index on pg_class c_1 (cost=0.28..8.30 rows=1 width=8) (actual time=0.002..0.002 rows=1 loops=625)
Index Cond: (relname = tbl.relname)
Filter: ((NOT relrowsecurity) OR (NOT row_security_active(oid)))
-> Seq Scan on pg_namespace n_1 (cost=0.00..1.05 rows=1 width=4) (actual time=0.001..0.001 rows=1 loops=625)
Filter: (nspname = ns.nspname)
Rows Removed by Filter: 3
-> Index Scan using pg_attribute_relid_attnum_index on pg_attribute a_1 (cost=0.29..43.98 rows=7 width=6) (actual time=0.004..0.050 rows=71 loops=625)
Index Cond: (attrelid = c_1.oid)
Filter: ((NOT attisdropped) AND has_column_privilege(c_1.oid, attnum, 'select'::text))
-> Index Scan using pg_statistic_relid_att_inh_index on pg_statistic s_2 (cost=0.29..0.44 rows=1 width=6) (actual time=0.001..0.001 rows=1 loops=44210)
Index Cond: ((starelid = a_1.attrelid) AND (staattnum = a_1.attnum))
Filter: (stanullfrac <> '0'::double precision)
Rows Removed by Filter: 0
Planning Time: 5.314 ms
JIT:
Functions: 110
Options: Inlining false, Optimization false, Expressions true, Deforming true
Timing: Generation 11.353 ms (Deform 5.059 ms), Inlining 0.000 ms, Optimization 5.227 ms, Emission 58.982 ms, Total 75.563 ms
Execution Time: 292436.103 ms

What is happening here?

The plans are clearly different, with the slow one claiming to sort 1.2
billion rows (but using only 6.5MB of disk space) despite the node below
it only returning 30921 rows. 1215036357 is almost but not quite
30921*40460, but that nested loop is a sibling of the sort, so it
shouldn't be included, right?

And why are the plans different at all? Computing a few extra values per
row shouldn't change the cost of the query delivering the rows, IMHO.
But then the costs are very similar, so maybe it's just some random
variation.

Some estimates are quite a bit off, even on the tables. I did ANALYZE
the whole database (and then the tables, again) during my tests.

hjp

--
_ | Peter J. Holzer | Story must make more sense than reality.
|_|_) | |
| | | hjp(at)hjp(dot)at | -- Charles Stross, "Creative writing
__/ | http://www.hjp.at/ | challenge!"

Responses

Browse pgsql-general by date

  From Date Subject
Next Message David Rowley 2025-07-22 22:08:31 Re: Wrapping a select in another select makes it slower
Previous Message Greg Sabino Mullane 2025-07-21 19:26:37 Re: Request for Feedback on PostgreSQL HA + Load Balancing Architecture