From: | Marco Boeringa <marco(at)boeringa(dot)demon(dot)nl> |
---|---|
To: | Andres Freund <andres(at)anarazel(dot)de> |
Cc: | pgsql-bugs(at)lists(dot)postgresql(dot)org, Thom Brown <thom(at)linux(dot)com> |
Subject: | Re: Potential "AIO / io workers" inter-worker locking issue in PG18? |
Date: | 2025-10-19 20:37:08 |
Message-ID: | 8798d95e-ed5c-46ad-8d22-26bcd0885fc2@boeringa.demon.nl |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs |
Hi Andres,
I have now been able to capture an actual bad plan using PostgreSQL's
'auto_explain' option, so this the bad plan that makes a process and job
that should take just a few seconds, cost more than 7 hours to execute.
And this is of course without updating the tables statistics with
ANALYZE after adding the primary key objectid, as that solved the issue
in PG18. It is now clear where PostgreSQL spends all of its time when
the bad plan is generated, see the loop number of the second index scan.
I am not sure why in this nested loop, two index scans on essentially
the same key and table are executed. You can compare this bad plan with
the one below it, that was generated with EXPLAIN in pgAdmin, not from
actual auto_explain output, but shows a different query execution plan.
Note again:
- I did not see the bad plan in PG<=17, even without the now added
ANALYZE step that solves the issue in PG18.
- The two tables involved ('landcover_grassy_small_scale_2_ply' and
'landcover_grassy_small_scale_2_ply_pg') both have primary keys and
unique objectid indexes.
- The join is one-to-one. 'landcover_grassy_small_scale_2_ply_pg' has
unique values for objectids, so only record per corresponding record
in 'landcover_grassy_small_scale_2_ply'
- The 'osm_tmp_28232_ch3' references a database view and represents the
chunk / multi-threaded job that needs to be executed from my Python
code, and references a selection of
the 'landcover_grassy_small_scale_2_ply_pg' table's records, which is
the filter in the last index scan in the bad plan below.
Marco
*** ACTUAL BAD PLAN AS CAPTURED BY auto_explain ***:
2025-10-17 23:32:17.375 CEST [242803] osm(at)mini_test LOG: duration:
27133980.536 ms plan:
Query Text: UPDATE osm.landcover_grassy_small_scale_2_ply AS t1 SET
area_geo = t2.area_geo, perim_geo = t2.perim_geo, compact_geo = CASE
WHEN t2.area_geo > 0 THEN ((power(t2.perim_geo,2) / t2.area_geo) / (4 *
pi())) ELSE 0 END, npoints_geo = t2.npoints_geo, comp_npoints_geo = CASE
WHEN t2.npoints_geo > 0 THEN (CASE WHEN t2.area_geo > 0 THEN
((power(t2.perim_geo,2) / t2.area_geo) / (4 * pi())) ELSE 0 END /
t2.npoints_geo) ELSE 0 END, convex_ratio_geo = CASE WHEN
ST_Area(ST_ConvexHull(way)::geography,true) > 0 THEN (t2.area_geo /
ST_Area(ST_ConvexHull(way)::geography,true)) ELSE 1 END FROM (SELECT
objectid,ST_Area(way::geography,true) AS
area_geo,ST_Perimeter(way::geography,true) AS perim_geo,ST_NPoints(way)
AS npoints_geo FROM osm.landcover_grassy_small_scale_2_ply) AS t2 WHERE
(t2.objectid = t1.objectid) AND t1.objectid IN (SELECT t3.objectid FROM
mini_test.osm.osm_tmp_28232_ch3 AS t3)
Update on osm.landcover_grassy_small_scale_2_ply t1
(cost=1.17..134.56 rows=0 width=0) (actual
time=27133980.532..27133980.534 rows=0.00 loops=1)
Buffers: shared hit=7585627828
-> Nested Loop (cost=1.17..134.56 rows=1 width=62) (actual
time=225948.297..27133979.894 rows=8.00 loops=1)
Output:
st_area((landcover_grassy_small_scale_2_ply.way)::geography, true),
st_perimeter((landcover_grassy_small_scale_2_ply.way)::geography, true),
CASE WHEN (st_area((landcover_grassy_small_scale_2_ply.way)::geography,
true) > '0'::double precision) THEN
((power(st_perimeter((landcover_grassy_small_scale_2_ply.way)::geography,
true), '2'::double precision) /
st_area((landcover_grassy_small_scale_2_ply.way)::geography, true)) /
'12.566370614359172'::double precision) ELSE '0'::double precision END,
st_npoints(landcover_grassy_small_scale_2_ply.way), CASE WHEN
(st_npoints(landcover_grassy_small_scale_2_ply.way) > 0) THEN (CASE WHEN
(st_area((landcover_grassy_small_scale_2_ply.way)::geography, true) >
'0'::double precision) THEN
((power(st_perimeter((landcover_grassy_small_scale_2_ply.way)::geography,
true), '2'::double precision) /
st_area((landcover_grassy_small_scale_2_ply.way)::geography, true)) /
'12.566370614359172'::double precision) ELSE '0'::double precision END /
(st_npoints(landcover_grassy_small_scale_2_ply.way))::double precision)
ELSE '0'::double precision END, CASE WHEN
(st_area((st_convexhull(t1.way))::geography, true) > '0'::double
precision) THEN
(st_area((landcover_grassy_small_scale_2_ply.way)::geography, true) /
st_area((st_convexhull(t1.way))::geography, true)) ELSE '1'::double
precision END, t1.ctid, landcover_grassy_small_scale_2_ply.ctid,
landcover_grassy_small_scale_2_ply_pg.ctid
Inner Unique: true
Join Filter: (landcover_grassy_small_scale_2_ply.objectid =
landcover_grassy_small_scale_2_ply_pg.objectid)
Buffers: shared hit=7585627705
-> Nested Loop (cost=0.75..4.79 rows=1 width=620) (actual
time=5.784..27131099.372 rows=222396.00 loops=1)
Output: t1.way, t1.ctid, t1.objectid,
landcover_grassy_small_scale_2_ply.way,
landcover_grassy_small_scale_2_ply.ctid,
landcover_grassy_small_scale_2_ply.objectid
Inner Unique: true
Join Filter: (t1.objectid =
landcover_grassy_small_scale_2_ply.objectid)
Rows Removed by Join Filter: 24729879210
Buffers: shared hit=7584737792
-> Index Scan using
landcover_grassy_small_scale_2_ply_pkey on
osm.landcover_grassy_small_scale_2_ply t1 (cost=0.38..2.39 rows=1
width=310) (actual time=5.176..462.613 rows=222396.00 loops=1)
Output: t1.way, t1.ctid, t1.objectid
Index Searches: 1
Buffers: shared hit=66411
-> Index Scan using
landcover_grassy_small_scale_2_ply_pkey on
osm.landcover_grassy_small_scale_2_ply (cost=0.38..2.39 rows=1
width=310) (actual time=0.035..114.709 rows=111198.50 loops=222396)
Output: landcover_grassy_small_scale_2_ply.way,
landcover_grassy_small_scale_2_ply.ctid,
landcover_grassy_small_scale_2_ply.objectid
Index Searches: 222396
Buffers: shared hit=7584671381
-> Index Scan using idx_osm_41 on
osm.landcover_grassy_small_scale_2_ply_pg (cost=0.42..2.44 rows=1
width=14) (actual time=0.009..0.009 rows=0.00 loops=222396)
Output: landcover_grassy_small_scale_2_ply_pg.ctid,
landcover_grassy_small_scale_2_ply_pg.objectid
Index Cond:
(landcover_grassy_small_scale_2_ply_pg.objectid = t1.objectid)
Filter:
((landcover_grassy_small_scale_2_ply_pg.page_number >= 31276) AND
(landcover_grassy_small_scale_2_ply_pg.page_number < 31766))
Rows Removed by Filter: 1
Index Searches: 222396
Buffers: shared hit=889584
*** LIKELY GOOD PLAN AS GENERATED BY CAPTURING THE SQL AND RUNNING
'EXPLAIN' IN pgAdmin ***:
"[
{
""Plan"": {
""Node Type"": ""ModifyTable"",
""Operation"": ""Update"",
""Parallel Aware"": false,
""Async Capable"": false,
""Relation Name"": ""landcover_grassy_small_scale_2_ply"",
""Schema"": ""osm"",
""Alias"": ""t1"",
""Startup Cost"": 1.14,
""Total Cost"": 9129.99,
""Plan Rows"": 0,
""Plan Width"": 0,
""Disabled"": false,
""Plans"": [
{
""Node Type"": ""Nested Loop"",
""Parent Relationship"": ""Outer"",
""Parallel Aware"": false,
""Async Capable"": false,
""Join Type"": ""Inner"",
""Startup Cost"": 1.14,
""Total Cost"": 9129.99,
""Plan Rows"": 70,
""Plan Width"": 62,
""Disabled"": false,
""Output"":
[""st_area((landcover_grassy_small_scale_2_ply.way)::geography, true)"",
""st_perimeter((landcover_grassy_small_scale_2_ply.way)::geography,
true)"", ""CASE WHEN
(st_area((landcover_grassy_small_scale_2_ply.way)::geography, true) >
'0'::double precision) THEN
((power(st_perimeter((landcover_grassy_small_scale_2_ply.way)::geography,
true), '2'::double precision) /
st_area((landcover_grassy_small_scale_2_ply.way)::geography, true)) /
'12.566370614359172'::double precision) ELSE '0'::double precision
END"", ""st_npoints(landcover_grassy_small_scale_2_ply.way)"", ""CASE
WHEN (st_npoints(landcover_grassy_small_scale_2_ply.way) > 0) THEN (CASE
WHEN (st_area((landcover_grassy_small_scale_2_ply.way)::geography, true)
> '0'::double precision) THEN
((power(st_perimeter((landcover_grassy_small_scale_2_ply.way)::geography,
true), '2'::double precision) /
st_area((landcover_grassy_small_scale_2_ply.way)::geography, true)) /
'12.566370614359172'::double precision) ELSE '0'::double precision END /
(st_npoints(landcover_grassy_small_scale_2_ply.way))::double precision)
ELSE '0'::double precision END"", ""CASE WHEN
(st_area((st_convexhull(t1.way))::geography, true) > '0'::double
precision) THEN
(st_area((landcover_grassy_small_scale_2_ply.way)::geography, true) /
st_area((st_convexhull(t1.way))::geography, true)) ELSE '1'::double
precision END"", ""t1.ctid"",
""landcover_grassy_small_scale_2_ply.ctid"",
""landcover_grassy_small_scale_2_ply_pg.ctid""],
""Inner Unique"": true,
""Plans"": [
{
""Node Type"": ""Nested Loop"",
""Parent Relationship"": ""Outer"",
""Parallel Aware"": false,
""Async Capable"": false,
""Join Type"": ""Inner"",
""Startup Cost"": 0.72,
""Total Cost"": 173.50,
""Plan Rows"": 70,
""Plan Width"": 828,
""Disabled"": false,
""Output"": [""t1.way"", ""t1.ctid"", ""t1.objectid"",
""landcover_grassy_small_scale_2_ply_pg.ctid"",
""landcover_grassy_small_scale_2_ply_pg.objectid""],
""Inner Unique"": true,
""Plans"": [
{
""Node Type"": ""Index Scan"",
""Parent Relationship"": ""Outer"",
""Parallel Aware"": false,
""Async Capable"": false,
""Scan Direction"": ""Forward"",
""Index Name"": ""idx_osm_35"",
""Relation Name"":
""landcover_grassy_small_scale_2_ply_pg"",
""Schema"": ""osm"",
""Alias"": ""landcover_grassy_small_scale_2_ply_pg"",
""Startup Cost"": 0.29,
""Total Cost"": 3.70,
""Plan Rows"": 70,
""Plan Width"": 14,
""Disabled"": false,
""Output"":
[""landcover_grassy_small_scale_2_ply_pg.ctid"",
""landcover_grassy_small_scale_2_ply_pg.objectid""],
""Index Cond"":
""((landcover_grassy_small_scale_2_ply_pg.page_number >= 28873) AND
(landcover_grassy_small_scale_2_ply_pg.page_number < 29373))""
},
{
""Node Type"": ""Index Scan"",
""Parent Relationship"": ""Inner"",
""Parallel Aware"": false,
""Async Capable"": false,
""Scan Direction"": ""Forward"",
""Index Name"":
""landcover_grassy_small_scale_2_ply_pkey"",
""Relation Name"":
""landcover_grassy_small_scale_2_ply"",
""Schema"": ""osm"",
""Alias"": ""t1"",
""Startup Cost"": 0.42,
""Total Cost"": 2.43,
""Plan Rows"": 1,
""Plan Width"": 814,
""Disabled"": false,
""Output"": [""t1.way"", ""t1.ctid"", ""t1.objectid""],
""Index Cond"": ""(t1.objectid =
landcover_grassy_small_scale_2_ply_pg.objectid)""
}
]
},
{
""Node Type"": ""Index Scan"",
""Parent Relationship"": ""Inner"",
""Parallel Aware"": false,
""Async Capable"": false,
""Scan Direction"": ""Forward"",
""Index Name"": ""landcover_grassy_small_scale_2_ply_pkey"",
""Relation Name"": ""landcover_grassy_small_scale_2_ply"",
""Schema"": ""osm"",
""Alias"": ""landcover_grassy_small_scale_2_ply"",
""Startup Cost"": 0.42,
""Total Cost"": 0.64,
""Plan Rows"": 1,
""Plan Width"": 814,
""Disabled"": false,
""Output"": [""landcover_grassy_small_scale_2_ply.way"",
""landcover_grassy_small_scale_2_ply.ctid"",
""landcover_grassy_small_scale_2_ply.objectid""],
""Index Cond"":
""(landcover_grassy_small_scale_2_ply.objectid = t1.objectid)""
}
]
}
]
}
}
]"
Op 12-10-2025 om 10:24 schreef Marco Boeringa:
> Hi Andres,
>
> I have been doing a bit more investigation. As I explained before, the
> problematic multi-threaded geoprocessing step is not some stand-alone
> query that can be easily reduced to small easily portable reproducible
> case with attached data. In fact, this geoprocessing step is part of a
> large custom build Python geoprocessing workflow, with total code
> probably in the 25k code lines range.
>
> However, based on the apparent poor query plan in PG18 / PostGIS
> 3.6.0, I now reviewed the exact code once more. I noticed that just
> before entering the multi-threaded code that emits the queries as seen
> below, I am actually adding the primary key field 'objectid' as
> "GENERATED BY DEFAULT AS IDENTITY" to the
> 'osm.landcover_scrubs_small_scale_2_ply' table.
>
> Now I also noticed I did not run ANALYZE after that against the same
> table. I have now added this to the code. Although it is still
> preliminary, first tests seem to indicate that this resolves the
> issue, and prevents the stalls or better said apparent hugely
> inefficient query plan (remember: a < 10 sec process was turned into
> multi-hour). I still need to do more thorough testing to be sure though.
>
> However, this raises a couple of question:
>
> - While ANALYZE is of course hugely important for proper statistics
> and query planning, I have wondered if PostgreSQL shouldn't
> automatically have updated the statistics for the addition of the
> primary key with IDENTITY? It seems to me that based on the definition
> of the primary key column and IDENTITY and table size, the actual
> distribution of values is essentially already known even before any
> sampling of ANALYZE to update statistics?
>
> - Am I right to assume that only the statistics on the objectid field
> play any role in this issue? As you can see, the WHERE clause does not
> involve any other fields than the two objectid fields of the main
> table and the chunk table specifying the job. All other values
> computed are just derived straight from the geometry column.
>
> - Were there any hints in the all the other data I supplied as to
> where PG18's query planning without the updated statistics of the new
> ANALYZE step added, is going wrong? And why this was never an issue in
> <= PG17?
>
> I also did some preliminary test with the old PG17.6 / PostgGIS 3.6.0
> cluster with the same Italy extract data. I still need to do more
> thorough testing, both with and without the extra ANALYZE step, to
> fully exclude that there isn't something related to the upgrade to
> PostGIS 3.6.0, but first indications are as I already saw with
> the PG17.6 / PostgGIS 3.5.3, that there are no issue with <= PG17 /
> PostGIS combination as regards this apparent planner issue.
>
> Marco
>
> Op 8-10-2025 om 21:08 schreef Andres Freund:
>> In [1] you showed a query. Reformated that looks like this:
>>
>> UPDATE osm.landcover_scrubs_small_scale_2_ply AS t1
>> SET area_geo = t2.area_geo,
>> perim_geo = t2.perim_geo,
>> compact_geo = CASE WHEN t2.area_geo > 0 THEN
>> ((power(t2.perim_geo,2) / t2.area_geo) / (4 * pi())) ELSE 0 END,
>> npoints_geo = t2.npoints_geo,
>> comp_npoints_geo = CASE WHEN t2.npoints_geo > 0 THEN (CASE WHEN
>> t2.area_geo > 0 THEN ((power(t2.perim_geo,2) / t2.area_geo) / (4 *
>> pi())) ELSE 0 END / t2.npoints_geo) ELSE 0 END,
>> convex_ratio_geo = CASE WHEN
>> ST_Area(ST_ConvexHull(way)::geography,true) > 0 THEN (t2.area_geo /
>> ST_Area(ST_ConvexHull(way)::geography,true)) ELSE 1 END
>> FROM (
>> SELECT
>> objectid,
>> ST_Area(way::geography,true) AS area_geo,
>> ST_Perimeter(way::geography,true) AS perim_geo,
>> ST_NPoints(way) AS npoints_geo
>> FROM osm.landcover_scrubs_small_scale_2_ply) AS t2
>> WHERE (t2.objectid = t1.objectid)
>> AND t1.objectid IN (SELECT t3.objectid FROM
>> mini_test.osm.osm_tmp_28128_ch5 AS t3)
>>
>>
>> Which certainly fits with two nested loops, although I don't think I
>> can infer
>> which order it the joins are in.
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2025-10-19 22:30:38 | Re: Issues with blocksize smaller than 8KB |
Previous Message | Tom Lane | 2025-10-19 20:31:55 | Re: Issues with blocksize smaller than 8KB |