Re: Fix gin index cost estimation

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Ronan Dunklau <ronan(dot)dunklau(at)aiven(dot)io>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Fix gin index cost estimation
Date: 2022-09-08 22:12:10
Message-ID: 4153708.1662675130@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Ronan Dunklau <ronan(dot)dunklau(at)aiven(dot)io> writes:
> Following the bug report at [1], I sent the attached patch to pgsql-bugs
> mailing list. I'm starting a thread here to add it to the next commitfest.

That link didn't work easily for me (possibly because it got split across
lines). Here's another one for anybody having similar issues:

https://www.postgresql.org/message-id/flat/CABs3KGQnOkyQ42-zKQqiE7M0Ks9oWDSee%3D%2BJx3-TGq%3D68xqWYw%40mail.gmail.com

> The problem I'm trying to solve is that, contrary to btree, gist and sp-gist
> indexes, gin indexes do not charge any cpu-cost for descending the entry tree.

As I said in the bug report thread, I think we really need to take a look
at all of our index AMs not just GIN. I extended your original reproducer
script to check all the AMs (attached), and suppressed memoize because it
seemed to behave differently for different AMs. Here's what I see for the
estimated costs of the inner indexscan, and the actual runtime, for each:

btree:
-> Index Only Scan using t1_btree_index on t1 (cost=0.28..0.30 rows=1 width=4) (actual time=0.001..0.001 rows=1 loops=20000)
Execution Time: 19.763 ms

gin (gin_trgm_ops):
-> Bitmap Heap Scan on t1 (cost=0.01..0.02 rows=1 width=4) (actual time=0.003..0.003 rows=1 loops=20000)
-> Bitmap Index Scan on t1_gin_index (cost=0.00..0.01 rows=1 width=0) (actual time=0.003..0.003 rows=1 loops=20000)
Execution Time: 75.216 ms

gist:
-> Index Only Scan using t1_gist_index on t1 (cost=0.14..0.16 rows=1 width=4) (actual time=0.014..0.014 rows=1 loops=20000)
Execution Time: 277.799 ms

spgist:
-> Index Only Scan using t1_spgist_index on t1 (cost=0.14..0.16 rows=1 width=4) (actual time=0.002..0.002 rows=1 loops=20000)
Execution Time: 51.407 ms

hash:
-> Index Scan using t1_hash_index on t1 (cost=0.00..0.02 rows=1 width=4) (actual time=0.000..0.000 rows=1 loops=20000)
Execution Time: 13.090 ms

brin:
-> Bitmap Heap Scan on t1 (cost=0.03..18.78 rows=1 width=4) (actual time=0.049..0.093 rows=1 loops=20000)
-> Bitmap Index Scan on t1_brin_index (cost=0.00..0.03 rows=1500 width=0) (actual time=0.003..0.003 rows=70 loops=20000)
Execution Time: 1890.161 ms

bloom:
-> Bitmap Heap Scan on t1 (cost=11.25..11.26 rows=1 width=4) (actual time=0.004..0.004 rows=1 loops=20000)
-> Bitmap Index Scan on t1_bloom_index (cost=0.00..11.25 rows=1 width=0) (actual time=0.003..0.003 rows=2 loops=20000)
Execution Time: 88.703 ms

(These figures shouldn't be trusted too much because I did nothing
to suppress noise. They seem at least somewhat reproducible, though.)

So, taking btree as our reference point, gin has clearly got a problem
because it's estimating less than a tenth as much cost despite actually
being nearly 4X slower. gist and spgist are not as bad off, but
nonetheless they claim to be cheaper than btree when they are not.
The result for hash looks suspicious as well, though at least we'd
make the right index choice for this particular case. brin and bloom
correctly report being a lot more expensive than btree, so at least
for the moment I'm not worried about them.

BTW, the artificially small random_page_cost doesn't really affect
this much. If I set it to a perfectly reasonable value like 1.0,
gin produces a saner cost estimate but gist, spgist, and hash do
not change their estimates at all. btree's estimate doesn't change
either, which seems like it might be OK for index-only scans but
I doubt I believe it for index scans. In any case, at least one
of gin and hash is doing it wrong.

In short, I think gist and spgist probably need a minor tweak to
estimate more CPU cost than they do now, and hash needs a really
hard look at whether it's sane at all.

That's all orthogonal to the merits of your patch for gin,
so I'll respond separately about that.

regards, tom lane

Attachment Content-Type Size
costtest.sql text/plain 1.6 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2022-09-08 22:15:23 Re: Reducing the chunk header sizes on all memory context types
Previous Message Jacob Champion 2022-09-08 21:34:19 Re: CFM Manager