Re: [RFC][PATCH] Order qual clauses by combined cost and selectivity

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Staroverov Ilja <i(dot)staroverov(at)ftdata(dot)ru>
Cc: "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: [RFC][PATCH] Order qual clauses by combined cost and selectivity
Date: 2026-04-22 21:47:42
Message-ID: 1325978.1776894462@sss.pgh.pa.us
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Staroverov Ilja <i(dot)staroverov(at)ftdata(dot)ru> writes:
> The attached patch changes the ranking heuristic to use
> cost / (1 - selectivity)
> where selectivity is the fraction of rows that pass the clause.

This (or some close relative) has been proposed before, but we
have been hesitant to do it because our cost metrics for qual
clauses are pretty nearly completely bogus: practically all
the built-in functions are assigned cost 1, even though in
reality they have a wide range of runtimes. Selectivity isn't
enormously reliable either. We could easily be taking a qual
order that the user has chosen carefully and stirring it around
more or less at random.

I'm suspicious of the particular form of this expression, too,
because selectivities close to 1 will produce very substantial
effects on the estimate even though there may not be that much
difference in practice, and the selectivity difference may be
mostly sampling error in the first place. I think you need
a formula that's not very sensitive to small differences, but
this will fail that test.

We had a similar discussion about two years ago concerning a
patch that (IIRC) tried to order sort columns according to
the estimated cost of the comparison functions. That got
reverted for a few reasons, but one of the big ones was that
the cost comparisons were largely garbage-in-garbage-out.

I think that a prerequisite for any work in this area is to
try to assign more realistic procost estimates to at least
a substantial fraction of the built-in pg_proc entries.
That's going to be tedious and probably contentious, but
it's hard to believe we can make much progress without
better cost data.

regards, tom lane

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Jacob Champion 2026-04-22 22:10:07 Re: PostgreSQL 17: Bug in libpq when libpq is dlopened/closed multiple times
Previous Message David Geier 2026-04-22 21:26:56 Re: Reduce build times of pg_trgm GIN indexes