Re: Fix gin index cost estimation

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Ronan Dunklau <ronan(dot)dunklau(at)aiven(dot)io>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Fix gin index cost estimation
Date: 2022-09-08 22:32:56
Message-ID: 4157141.1662676376@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Ronan Dunklau <ronan(dot)dunklau(at)aiven(dot)io> writes:
> The problem I'm trying to solve is that, contrary to btree, gist and sp-gist
> indexes, gin indexes do not charge any cpu-cost for descending the entry tree.

I looked this over briefly. I think you are correct to charge an
initial-search cost per searchEntries count, but don't we also need to
scale up by arrayScans, similar to the "corrections for cache effects"?

+ * We model index descent costs similarly to those for btree, but we also
+ * need an idea of the tree_height.
+ * We use numEntries / numEntryPages as the fanout factor.

I'm not following that calculation? It seems like it'd be correct
only for a tree height of 1, although maybe I'm just misunderstanding
this (overly terse, perhaps) comment.

+ * We charge descentCost once for every entry
+ */
+ if (numTuples > 1)
+ {
+ descentCost = ceil(log(numTuples) / log(2.0)) * cpu_operator_cost;
+ *indexStartupCost += descentCost * counts.searchEntries;
+ }

I had to read this twice before absorbing the point of the numTuples
test. Maybe help the reader a bit:

+ if (numTuples > 1) /* ensure positive log() */

Personally I'd duplicate the comments from nbtreecostestimate rather
than just assuming the reader will go consult them. For that matter,
why didn't you duplicate nbtree's logic for charging for SA scans?
This bit seems just as relevant for GIN:

* If there are ScalarArrayOpExprs, charge this once per SA scan. The
* ones after the first one are not startup cost so far as the overall
* plan is concerned, so add them only to "total" cost.

Keep in mind also that pgindent will have its own opinions about how to
format these comments, and it can out-stubborn you. Either run the
comments into single paragraphs, or if you really want them to be two
paras then leave an empty comment line between. Another formatting
nitpick is that you seem to have added a number of unnecessary blank
lines.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2022-09-08 22:53:29 Re: Reducing the chunk header sizes on all memory context types
Previous Message Jacob Champion 2022-09-08 22:32:35 Re: [PATCH] Log details for client certificate failures