Re: Fix gin index cost estimation

From: Ronan Dunklau <ronan(dot)dunklau(at)aiven(dot)io>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Fix gin index cost estimation
Date: 2022-09-15 10:25:06
Message-ID: 4212593.ejJDZkT8p0@aivenronan
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Le lundi 12 septembre 2022, 16:41:16 CEST Ronan Dunklau a écrit :
> But I realised that another approach might be better suited: since we want
to
> charge a cpu cost for every page visited, actually basing that on the
already
> estimated entryPagesFetched and dataPagesFetched would be better, instead of
> copying what is done for other indexes type and estimating the tree height.
It
> would be simpler, as we don't need to estimate the tree height anymore.
>
> I will submit a patch doing that.

The attached does that and is much simpler. I only took into account
entryPagesFetched, not sure if we should also charge something for data pages.

Instead of trying to estimate the height of the tree, we rely on the
(imperfect) estimation of the number of entry pages fetched, and charge 50
times cpu_operator_cost to that, in addition to the cpu_operator_cost charged
per entry visited.

I also adapted to take into accounts multiple scans induced by scalar array
operations.

As it is, I don't understand the following calculation:

/*
* Estimate number of entry pages read. We need to do
* counts.searchEntries searches. Use a power function as it should be,
* but tuples on leaf pages usually is much greater. Here we include all
* searches in entry tree, including search of first entry in partial
* match algorithm
*/
entryPagesFetched += ceil(counts.searchEntries * rint(pow(numEntryPages,
0.15)));

Is the power(0.15) used an approximation for a log ? If so why ? Also
shouldn't we round that up ?
It seems to me it's unlikely to affect the total too much in normal cases
(adding at worst random_page_cost) but if we start to charge cpu operator
costs as proposed here it makes a big difference and it is probably
safer to overestimate a bit than the opposite.

With those changes, the gin cost (purely cpu-wise) stays above the btree one
as I think it should be.

--
Ronan Dunklau

Attachment Content-Type Size
v2-0001-Fix-gin-costing.patch text/x-patch 3.6 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bharath Rupireddy 2022-09-15 10:25:54 Re: Assertion failure in WaitForWALToBecomeAvailable state machine
Previous Message Simon Riggs 2022-09-15 10:04:14 Re: SUBTRANS: Minimizing calls to SubTransSetParent()