Re: Fix gin index cost estimation

From: Ronan Dunklau <ronan(dot)dunklau(at)aiven(dot)io>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Fix gin index cost estimation
Date: 2022-09-12 14:41:16
Message-ID: 1924159.usQuhbGJ8B@aivenronan
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Thank you for looking at it.

> I looked this over briefly. I think you are correct to charge an
> initial-search cost per searchEntries count, but don't we also need to
> scale up by arrayScans, similar to the "corrections for cache effects"?
> + * We model index descent costs similarly to those for btree, but
we also
> + * need an idea of the tree_height.
> + * We use numEntries / numEntryPages as the fanout factor.
> I'm not following that calculation? It seems like it'd be correct
> only for a tree height of 1, although maybe I'm just misunderstanding
> this (overly terse, perhaps) comment.

I don't really understand why that would work only with a tree height of one ?
Every entry page contains a certain amount of entries, and as such computing
the average number of entries per page seems to be a good approximation for
the fanout. But I may have misunderstood what was done in other index types.

For consistency, maybe we should just use a hard coded value of 100 for the
fanout factor, similarly to what we do for other index types.

But I realised that another approach might be better suited: since we want to
charge a cpu cost for every page visited, actually basing that on the already
estimated entryPagesFetched and dataPagesFetched would be better, instead of
copying what is done for other indexes type and estimating the tree height. It
would be simpler, as we don't need to estimate the tree height anymore.

I will submit a patch doing that.

> + * We charge descentCost once for every entry
> + */
> + if (numTuples > 1)
> + {
> + descentCost = ceil(log(numTuples) / log(2.0)) *
> + *indexStartupCost += descentCost *
> + }
> I had to read this twice before absorbing the point of the numTuples
> test. Maybe help the reader a bit:
> + if (numTuples > 1) /* ensure positive log() */

Ok. On second read, I think that part was actually wrong: what we care about
is not the number of tuples here, but the number of entries.

> Personally I'd duplicate the comments from nbtreecostestimate rather
> than just assuming the reader will go consult them. For that matter,
> why didn't you duplicate nbtree's logic for charging for SA scans?
> This bit seems just as relevant for GIN:
> * If there are ScalarArrayOpExprs, charge this once per SA scan.
> * ones after the first one are not startup cost so far as the
> * plan is concerned, so add them only to "total" cost.

You're right. So what we need to do here is scale up whatever we charge for
the startup cost by the number of arrayscans for the total cost.

> Keep in mind also that pgindent will have its own opinions about how to
> format these comments, and it can out-stubborn you. Either run the
> comments into single paragraphs, or if you really want them to be two
> paras then leave an empty comment line between. Another formatting
> nitpick is that you seem to have added a number of unnecessary blank
> lines.

Thanks, noted.

I'll submit a new patch soon, as soon as i've resolved some of the problems I
have when accounting for scalararrayops.

Best regards,

Ronan Dunklau

In response to


Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2022-09-12 14:51:57 Re: list of acknowledgments for PG15
Previous Message Erik Rijkers 2022-09-12 14:18:41 Re: proposal: possibility to read dumped table's name from file