Quick Links

Re: Problems with get_actual_variable_range's VISITED_PAGES_LIMIT

From:	Andres Freund <andres(at)anarazel(dot)de>
To:	Peter Geoghegan <pg(at)bowt(dot)ie>
Cc:	Jakub Wartak <jakub(dot)wartak(at)enterprisedb(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, MARK CALLAGHAN <mdcallag(at)gmail(dot)com>, Tomas Vondra <tomas(at)vondra(dot)me>
Subject:	Re: Problems with get_actual_variable_range's VISITED_PAGES_LIMIT
Date:	2026-03-10 22:08:35
Message-ID:	4s3vd2rjll4vgnhdnajrqwsnhnmnmtmjtwjngfsrzmjx3nrxvp@la2vohpjojnk
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Hi,

On 2026-03-10 17:03:14 -0400, Peter Geoghegan wrote:
> On Tue, Feb 10, 2026 at 4:26 AM Jakub Wartak
> <jakub(dot)wartak(at)enterprisedb(dot)com> wrote:
> > I just worry that if get_actual_variable_range() starts reporting much worse
> > estimates what do we do then? Well maybe, we'll have pg_plan_advice then,
> > or maybe the depth (time?) of search should be somehow bound to the column's
> > statistics_target too, but on second thought it looks like it should be more
> > property of the query itself (so maybe some GUC?)
>
> The purpose of get_actual_variable_range is to get the extremal value
> in the index -- a single value from a single tuple. If we can't manage
> that by reading only 2 or 3 pages, then I see no good reason to
> believe we can by reading many more pages.
>
> In other words, we're perfectly justified in making a soft expectation
> that it'll require very little work to get an extremal value. But once
> we notice that our soft assumption doesn't hold, all bets are off --
> we're then justified in giving up relatively quickly. Having several
> contiguous pages that are completely empty (or empty of
> non-LP_DEAD-marked tuples) at the extreme leftmost or rightmost en of
> the index is absolutely a pathological case. It strongly suggests a
> queue-like workload of the kind that my testing shows that
> VISITED_PAGES_LIMIT can't deal with sensibly.

You don't need a queuing workload to occasionally have a few pages of dead
tuples at one end of the value range. E.g. a small batch insert that rolls
back due to a unique conflict is enough. The insert case is also the one
where, without get_actual_variable_range(), we will often end up with
completely bogus estimates, as obviously the stored stats won't yet know about
newly inserted data.

> get_actual_variable_range exists to ascertain information about a
> particular index. To me, it makes perfect sense that a mechanism such
> as this would work in terms of costs paid on the index AM side. (The
> heap fetches matter a great deal too, of course, but it's reasonable
> to use index costs as a proxy for heap/table AM costs -- but it's not
> reasonable to use table/heap AM costs as a proxy for index AM costs.
> It doesn't work both ways.)

I am not convinced it's true either direction - tids from three leaf pages of
tids can point to a very small number of heap pages or hundreds of heap
pages. Why is the index AM cost a good proxy for the table? If anything it's
more sane the other way round (i.e. how it is today).

Greetings,

Andres Freund

In response to

Re: Problems with get_actual_variable_range's VISITED_PAGES_LIMIT at 2026-03-10 21:03:14 from Peter Geoghegan

Responses

Re: Problems with get_actual_variable_range's VISITED_PAGES_LIMIT at 2026-03-13 01:13:40 from Peter Geoghegan

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Heikki Linnakangas	2026-03-10 22:09:14	Re: Fix uninitialized xl_running_xacts padding
Previous Message	Zsolt Parragi	2026-03-10 22:05:25	Re: [PATCH] Add pg_get_database_ddl() function to reconstruct CREATE DATABASE statement