Re: Binary search in ScalarArrayOpExpr for OR'd constant arrays

From: David Rowley <dgrowleyml(at)gmail(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
Cc: James Coleman <jtc331(at)gmail(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Binary search in ScalarArrayOpExpr for OR'd constant arrays
Date: 2021-04-10 22:03:23
Message-ID: CAApHDvqQHVXGHje6zHo8mgfLq6-CPEn0pR-KvZtCmfxF1Fwu7Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, 10 Apr 2021 at 10:32, Tomas Vondra
<tomas(dot)vondra(at)enterprisedb(dot)com> wrote:
> But I think the puzzle is not so much about v5 vs v6, but more about v5
> vs. master. I still don't understand how v5 managed to be faster than
> master, but maybe I'm missing something.

Well, v5 wrapped ScalarArrayOpExpr inside HashedScalarArrayOpExpr, but
didn't add cases for HashedScalarArrayOpExpr in all locations where it
should have. For example, fix_expr_common() has a case for
ScalarArrayOpExpr, but if we gave it a HashedScalarArrayOpExpr it
would have very little work to do.

I've not gone and proved that's the exact reason why the planner
became faster, but I really don't see any other reason.

> Maybe the right solution is to rely on the estimates, but then also
> enable the hashing if we significantly cross the threshold during
> execution. So for example we might get estimate 10 rows, and calculate
> that the hashing would start winning at 100 rows, so we start without
> hashing. But then at execution if we get 200 rows, we build the hash
> table and start using it.

To do that, we'd need to store the number of evaluations of the
function somewhere. I'm not really sure that would be a good idea as I
imagine we'd need to store that in ExprEvalStep. I imagine if that
was a good idea then we'd have done the same for JIT.

> On 4/9/21 1:21 AM, David Rowley wrote:
> > time values are in seconds:
> >
> > pg-master 28.07
> > prepared 11.28
> > simple 16.79
> >
> > pg-v6 15.86
> > prepared 5.23
> > simple 10.63
> >
> > So, the overall result when applying the total time is 177%.
> >
>
> Not sure how you got those numbers, or how it explains the results.

I got it by doing "1 / tps * 1000" to get the time it would take to
execute 1000 transactions of each of your tests. I then grouped by
patch, mode and took the sum of the calculated number. My point was
that overall the patch is significantly faster. I was trying to
highlight that the 0 and 1 row test take up very little time and the
overhead of building the hash table is only showing up because the
query executes so quickly.

FWIW, I think executing a large IN clause on a table that has 0 rows
is likely not that interesting a case to optimise for. That's not the
same as a query that just returns 0 rows due to filtering out hundreds
or thousands of rows during execution. The overhead of building the
hash table is not going to show up very easily in that sort of case.

> I understand why the prepared mode got slower. I don't understand how
> the simple mode got faster.

I very much imagine v5 was faster at planning due to the unfinished
nature of the patch. I'd not added support for HashedScalarArrayOpExpr
in all the places I should have. That would result in the planner
skipping lots of work that it needs to do. The way I got it to work
was to add it, then just add enough cases in the planner to handle
HashedScalarArrayOpExpr so I didn't get any errors. I stopped after
that just to show the idea. Lack of errors does not mean it was
correct. At least setrefs.c was not properly handling
HashedScalarArrayOpExpr.

I really think it would be best if we just ignore the performance of
v5. Looking at the performance of a patch that was incorrectly
skipping a bunch of required work does not seem that fair.

David

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2021-04-10 22:38:53 Re: Binary search in ScalarArrayOpExpr for OR'd constant arrays
Previous Message Tom Lane 2021-04-10 21:57:43 Re: Reference Leak with type