Re: Binary search in ScalarArrayOpExpr for OR'd constant arrays

From: David Rowley <dgrowleyml(at)gmail(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc: James Coleman <jtc331(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Binary search in ScalarArrayOpExpr for OR'd constant arrays
Date: 2020-04-23 22:09:36
Message-ID: CAApHDvqkXamVm6p=T4+p72pMv6+V+q4B=t2iKmwZhw_9wZOoyg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, 24 Apr 2020 at 02:56, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
>
> On Thu, Apr 23, 2020 at 09:02:26AM -0400, James Coleman wrote:
> >On Thu, Apr 23, 2020 at 8:47 AM Tomas Vondra
> ><tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
> >As to the choice of 9 elements: I just picked that as a starting
> >point; Andres had previously commented off hand that at 8 elements
> >serial scanning was faster, so I figured this was a reasonable
> >starting point for discussion.
> >
> >Perhaps it would make sense to determine that minimum not as a pure
> >constant but scaled based on how many rows the planner expects us to
> >see? Of course that'd be a more invasive patch...so may or may not be
> >as feasible as a reasonable default.
> >
>
> Not sure. That seems a bit overcomplicated and I don't think it depends
> on the number of rows the planner expects to see very much. I think we
> usually assume the linear search is cheaper for small arrays and then at
> some point the binary search starts winning The question is where this
> "break even" point is.
>
> I think we usually use something like 64 or so in other places, but
> maybe I'm wrong. The current limit 9 seems a bit too low, but I may be
> wrong. Let's not obsess about this too much, let's do some experiments
> and pick a value based on that.

If single comparison for a binary search costs about the same as an
equality check, then wouldn't the crossover point be much lower than
64? The binary search should find or not find the target in log2(N)
rather than N. ceil(log2(9)) is 4, which is of course less than 9.
For 64, it's 6, so are you not just doing a possible 58 equality
checks than necessary? Of course, it's a bit more complex as for
values that *are* in the array, the linear search will, on average,
only check half the values. Assuming that, then 9 does not seem too
far off. I guess benchmarks at various crossover points would speak a
thousand words.

David

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2020-04-23 22:35:44 Re: Binary search in ScalarArrayOpExpr for OR'd constant arrays
Previous Message Andres Freund 2020-04-23 21:16:29 Re: backup manifests