Help needed to understand query planner regression with incremental sort

From: "Henrik Peinar (nodeSWAT(dot)com)" <henrik(at)nodeswat(dot)com>
To: pgsql-general(at)lists(dot)postgresql(dot)org
Subject: Help needed to understand query planner regression with incremental sort
Date: 2023-05-31 10:46:37
Message-ID: CAEwi-F_OG=j1GGTdqJHFeEiSRSWuQpEnDdtvbF+ykekCoocsTQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

*Quick description: *After upgrading our Aurora PostgreSQL engine from v11
to v15, one of our often run SQL's started taking tens of seconds instead
of running sub 100ms. Explain analyze showed that the query planner had
switched to using incremental sort. Running ANALYZE on the table fixed the
issue temporarily, but it came back after few days.

I'm looking to dig a bit deeper to be able to submit a bug report to the
PostgreSQL team so maybe something can be done to fix this, as this
definitely feels as a bug (the query running times are 100x in difference)
but I need a bit of help as I'm new to this deep PostgreSQL analysis and
I'm struggling finding resource how to proceed in such a case. And maybe
it's an issue with my own setup or database configuration instead (ie. me
just being stupid).

I know that I can solve the issue for my own environments with turning
incremental sort off in the DB parameters, but I feel this might be worth
putting a bit time into to be able to find the root cause and help me
understand the planner a bit better.

*More detailed description:*
Note: I don't have reproduction steps at this time, this is one of the
reasons I'm asking help to figure out what could I try to do to start
trying reproducing this in local setup. Below info is just giving out as
much info as I can to help understand the setup I have.

The query under question is rather simplistic as far as SQL itself goes at
least (attached: original_query.sql):

*select * from "products" inner join "shops" on "shops"."id" =
"products"."shop_id" where "products"."status" = 'published' and
products.desc_tsv @@ to_tsquery('simple', 'nike:*') order by
"products"."pinned_at" ASC, "products"."listed_at" DESC limit 61;*
With normal running conditions this query produces the following query
plan: https://explain.dalibo.com/plan/af8ch7a59ch6459d (attached:
good_planner_path.txt)

With incremental sorting path, it produces the following
https://explain.dalibo.com/plan/428564152aa3ba37 (attached:
incremental_sort_planner_path.txt)

I've tried to include as slim table / index setup as possible, but the
tables are actually bigger and there are a lot more indexes present, I
don't know how other indexes and / or columns might affect the planner path
in this case.
https://dbdiagram.io/d/64771e48722eb7749422715e (attached: schema_setup.sql)

In production DB the size of products table is around 5 452 840 rows.
According to the information_schema the size of the products table is
around 8.1GB.

If the incremental sort query plan starts happening, running ANALYZE
products; fixes the issue and the planner switches back to the original
plan.

*What am I asking for?*
Any ideas / help / queries that would get me closer to understanding why
planner chooses so much slower query in this case or ideas how to start
trying to reproduce this locally as it seems to happen intermittently in
production environment, I'm unsure if I can replicate that type of load
locally. Maybe someone spots very obvious mistakes / issues with the table
/ index / query setup right away that I've missed.

Any help appreciated,
Henrik.

Attachment Content-Type Size
incremental_sort_planner_path.txt text/plain 1.5 KB
original_query.sql application/octet-stream 259 bytes
good_planner_path.txt text/plain 747 bytes
schema_setup.sql application/octet-stream 1.8 KB

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Bruce Momjian 2023-05-31 11:00:10 Re: Pg 16: will pg_dump & pg_restore be faster?
Previous Message shveta malik 2023-05-31 09:40:34 Re: Support logical replication of DDLs