Re: Binary search in ScalarArrayOpExpr for OR'd constant arrays

From: James Coleman <jtc331(at)gmail(dot)com>
To: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Cc: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, David Rowley <dgrowleyml(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Binary search in ScalarArrayOpExpr for OR'd constant arrays
Date: 2020-09-08 19:25:41
Message-ID: CAAaqYe88MaKdnK=Gpn_0VvjeJb9UBuPMvnWzbuTW_rtM5G+G4Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Aug 19, 2020 at 3:16 AM Heikki Linnakangas <hlinnaka(at)iki(dot)fi> wrote:
>
> On 01/05/2020 05:20, James Coleman wrote:
> > On Tue, Apr 28, 2020 at 8:25 AM Tomas Vondra
> > <tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
> > ...
> >> Any particular reasons to pick dynahash over simplehash? ISTM we're
> >> using simplehash elsewhere in the executor (grouping, tidbitmap, ...),
> >> while there are not many places using dynahash for simple short-lived
> >> hash tables. Of course, that alone is a weak reason to insist on using
> >> simplehash here, but I suppose there were reasons for not using dynahash
> >> and we'll end up facing the same issues here.
> >
> > I've attached a patch series that includes switching to simplehash.
> > Obviously we'd really just collapse all of these patches, but it's
> > perhaps interesting to see the changes required to use each style
> > (binary search, dynahash, simplehash).
> >
> > As before, there are clearly comments and naming things to be
> > addressed, but the implementation should be reasonably clean.
>
> Looks good, aside from the cleanup work that you mentioned. There are a
> few more cases that I think you could easily handle with very little
> extra code:
>
> You could also apply the optimization for non-Const expressions, as long
> as they're stable (i.e. !contain_volatile_functions(expr)).

Is that true? Don't we also have to worry about whether or not the
value is stable (i.e., know when a param has changed)? There have been
discussions about being able to cache stable subexpressions, and my
understanding was that we'd need to have that infrastructure (along
with the necessarily invalidation when the param changes) to be able
to use this for non-Const expressions.

> Deconstructing the array Datum into a simple C array on first call would
> be a win even for very small arrays and for AND semantics, even if you
> don't use a hash table.

Because you wouldn't have to repeatedly detoast it? Or some other
reason I'm not thinking of? My intuition would have been that (aside
from detoasting if necessary) there wouldn't be much real overhead in,
for example, an array storing integers.

James

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Zhang 2020-09-08 19:26:19 Re: PATCH: Attempt to make dbsize a bit more consistent
Previous Message Tom Lane 2020-09-08 19:24:54 Re: More aggressive vacuuming of temporary tables