Re: Bloom index cost model seems to be wrong

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Cc: Thomas Kellerer <spam_eater(at)gmx(dot)net>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Teodor Sigaev <teodor(at)sigaev(dot)ru>, Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>
Subject: Re: Bloom index cost model seems to be wrong
Date: 2019-02-28 18:30:11
Message-ID: 20591.1551378611@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-performance

Jeff Janes <jeff(dot)janes(at)gmail(dot)com> writes:
> Should we be trying to estimate the false positive rate and charging
> cpu_tuple_cost and cpu_operator_cost the IO costs for visiting the table to
> recheck and reject those? I don't think other index types do that, and I'm
> inclined to think the burden should be on the user not to create silly
> indexes that produce an overwhelming number of false positives.

Heap-access costs are added on in costsize.c, not in the index
cost estimator. I don't remember at the moment whether there's
any explicit accounting for lossy indexes (i.e. false positives).
Up to now, there haven't been cases where we could estimate the
false-positive rate with any accuracy, so we may just be ignoring
the effect. But if we decide to account for it, I'd rather have
costsize.c continue to add on the actual cost, perhaps based on
a false-positive-rate fraction returned by the index estimator.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Mike Palmiotto 2019-02-28 18:36:32 Re: [RFC] [PATCH] Flexible "partition pruning" hook
Previous Message Perumal Raj 2019-02-28 18:29:25 Re: Question about pg_upgrade from 9.2 to X.X

Browse pgsql-performance by date

  From Date Subject
Next Message ROS Didier 2019-03-01 07:54:12 RE: How to get the content of Bind variables
Previous Message Jeff Janes 2019-02-28 18:11:16 Re: Bloom index cost model seems to be wrong