|From:||Ronan Dunklau <ronan(dot)dunklau(at)aiven(dot)io>|
|To:||Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>|
|Subject:||Re: Fix gin index cost estimation|
|Views:||Raw Message | Whole Thread | Download mbox | Resend email|
Le vendredi 16 septembre 2022, 22:04:59 CEST Tom Lane a écrit :
> Ronan Dunklau <ronan(dot)dunklau(at)aiven(dot)io> writes:
> > The attached does that and is much simpler. I only took into account
> > entryPagesFetched, not sure if we should also charge something for data
> I think this is wrong, because there is already a CPU charge based on
> the number of tuples visited, down at the very end of the routine:
> *indexTotalCost += (numTuples * *indexSelectivity) *
(cpu_index_tuple_cost + qual_op_cost);
> It's certainly possible to argue that that's incorrectly computed,
> but just adding another cost charge for the same topic can't be right.
I don't think it's the same thing. The entryPagesFetched is computed
independently of the selectivity and the number of tuples. As such, I think it
makes sense to use it to compute the cost of descending the entry tree.
As mentioned earlier, I don't really understand the formula for computing
entryPagesFetched. If we were to estimate the tree height to compute the
descent cost as I first proposed, I feel like we would use two different metrics
for what is essentially the same cost: something proportional to the size of
the entry tree.
> I do suspect that that calculation is bogus, because it looks like it's
> based on the concept of "apply the quals at each index entry", which we
> know is not how GIN operates. So maybe we should drop that bit in favor
> of a per-page-ish cost like you have here. Not sure. In any case it
> seems orthogonal to the question of startup/descent costs. Maybe we'd
> better tackle just one thing at a time.
Hum, good point. Maybe that should be revisited too.
> (BTW, given that that charge does exist and is not affected by
> repeated-scan amortization, why do we have a problem in the first place?
> Is it just too small? I guess that when we're only expecting one tuple
> to be retrieved, it would only add about cpu_index_tuple_cost.)
Because with a very low selectivity, we end up under-charging for the cost of
walking the entry tree by a significant amount. As said above, I don't see how
those two things are the same: that charge estimates the cost of applying
index quals to the visited tuples, which is not the same as charging per entry
> > Is the power(0.15) used an approximation for a log ? If so why ? Also
> > shouldn't we round that up ?
> No idea, but I'm pretty hesitant to just randomly fool with that equation
> when (a) neither of us know where it came from and (b) exactly no evidence
> has been provided that it's wrong.
> I note for instance that the existing logic switches from charging 1 page
> per search to 2 pages per search at numEntryPages = 15 (1.5 ^ (1/0.15)).
> Your version would switch at 2 pages, as soon as the pow() result is even
> fractionally above 1.0. Maybe the existing logic is too optimistic there,
> but ceil() makes it too pessimistic I think. I'd sooner tweak the power
> constant than change from rint() to ceil().
You're right, I was too eager to try to raise the CPU cost proportionnally to
the number of array scans (scalararrayop). I'd really like to understand where
this equation comes from though...
|Next Message||Thomas Munro||2022-09-19 07:30:09||Re: Tree-walker callbacks vs -Wdeprecated-non-prototype|
|Previous Message||Peter Smith||2022-09-19 06:51:31||Re: Data is copied twice when specifying both child and parent table in publication|