Re: Fix gin index cost estimation

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Ronan Dunklau <ronan(dot)dunklau(at)aiven(dot)io>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Fix gin index cost estimation
Date: 2022-09-16 20:04:59
Message-ID: 3871020.1663358699@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Ronan Dunklau <ronan(dot)dunklau(at)aiven(dot)io> writes:
> The attached does that and is much simpler. I only took into account
> entryPagesFetched, not sure if we should also charge something for data pages.

I think this is wrong, because there is already a CPU charge based on
the number of tuples visited, down at the very end of the routine:

*indexTotalCost += (numTuples * *indexSelectivity) * (cpu_index_tuple_cost + qual_op_cost);

It's certainly possible to argue that that's incorrectly computed,
but just adding another cost charge for the same topic can't be right.

I do suspect that that calculation is bogus, because it looks like it's
based on the concept of "apply the quals at each index entry", which we
know is not how GIN operates. So maybe we should drop that bit in favor
of a per-page-ish cost like you have here. Not sure. In any case it
seems orthogonal to the question of startup/descent costs. Maybe we'd
better tackle just one thing at a time.

(BTW, given that that charge does exist and is not affected by
repeated-scan amortization, why do we have a problem in the first place?
Is it just too small? I guess that when we're only expecting one tuple
to be retrieved, it would only add about cpu_index_tuple_cost.)

> Is the power(0.15) used an approximation for a log ? If so why ? Also
> shouldn't we round that up ?

No idea, but I'm pretty hesitant to just randomly fool with that equation
when (a) neither of us know where it came from and (b) exactly no evidence
has been provided that it's wrong.

I note for instance that the existing logic switches from charging 1 page
per search to 2 pages per search at numEntryPages = 15 (1.5 ^ (1/0.15)).
Your version would switch at 2 pages, as soon as the pow() result is even
fractionally above 1.0. Maybe the existing logic is too optimistic there,
but ceil() makes it too pessimistic I think. I'd sooner tweak the power
constant than change from rint() to ceil().

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Laurenz Albe 2022-09-16 20:07:48 Re: Pruning never visible changes
Previous Message Peter Geoghegan 2022-09-16 19:24:03 Re: Reducing the WAL overhead of freezing in VACUUM by deduplicating per-tuple freeze plans