Re: Problems with get_actual_variable_range's VISITED_PAGES_LIMIT

From: Jakub Wartak <jakub(dot)wartak(at)enterprisedb(dot)com>
To: Peter Geoghegan <pg(at)bowt(dot)ie>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, MARK CALLAGHAN <mdcallag(at)gmail(dot)com>, Tomas Vondra <tomas(at)vondra(dot)me>
Subject: Re: Problems with get_actual_variable_range's VISITED_PAGES_LIMIT
Date: 2026-02-10 09:26:22
Message-ID: CAKZiRmwjmGBWvPx6=qFRXJhm=sCrUyE7pPiPGA454LDdKemtQg@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Feb 10, 2026 at 12:15 AM Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
>
> get_actual_variable_range() scans an index to find actual min/max
> values for selectivity estimation. Since this happens during planning,
> we can't afford to spend too much time on it. We use
> VISITED_PAGES_LIMIT (100 heap pages) as a threshold -- once we've
> visited that many heap pages, we give up and fall back to pg_statistic
> values. See commit 9c6ad5eaa9 for background information.
[..]
> Thoughts?

FWIW, dunno if that helps from what I remember, the 9c6ad5eaa957 itself was
born literally just like in 5 minutes just as very quick bandage attempt to
just rescue some big OLTP database (in backpatchable way), so there no was
serious research involved back then. AFAIR the alternative idea was to simply
to cage it into max time limit (using time delta), but that was same
basically the same idea. The 100 tuples there translate directly to
WE_HAVE_NO_IDEA, so I think you could simply fallback also N*100 index
index (instead of heap) pages and it would still be good enough - why? Well
because those below are the timing literally from the case that caused
this back then
(not kidding):
Planning Time: 1620687.653 ms
Execution Time: 970.716 ms

I just worry that if get_actual_variable_range() starts reporting much worse
estimates what do we do then? Well maybe, we'll have pg_plan_advice then,
or maybe the depth (time?) of search should be somehow bound to the column's
statistics_target too, but on second thought it looks like it should be more
property of the query itself (so maybe some GUC?)

-J.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Mayrom Rabinovich 2026-02-10 09:29:21 Re: Planing edge case for sorts with limit on non null column
Previous Message Ronan Dunklau 2026-02-10 09:15:51 Re: Exit walsender before confirming remote flush in logical replication