Re: extend pgbench expressions with functions

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
Cc: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, PostgreSQL Developers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: extend pgbench expressions with functions
Date: 2016-02-16 00:18:37
Message-ID: CA+TgmoaD9HtBGyK_dJEABZhtgy1eAoVgLLYY3iMGAfF62VH2VQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Feb 15, 2016 at 4:58 AM, Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr> wrote:
> Indeed!
>
>> Taking patch 1 as a completely independent thing, there is no need to
>> introduce PgBenchValueType yet. Similar remark for setIntValue and
>> coerceToInt. They are useful afterwards when introducing double types to be
>> able to handle double input parameters for the gaussian and other functions.
>
> Yes. This is exactly the pain I'm trying to avoid, creating a different
> implementation for the first patch, which is just overriden when the second
> part is applied...

Splitting a patch means breaking it into independently committable
sub-patches, not just throwing each line of the diff into a different
pile as best you can. I'm with Michael: that part doesn't belong in
this patch. If we want to have an infrastructure refactoring patch
that just replaces every relevant use of int64 with PgBenchValue, a
union supporting only integer values, then we can do that first and
have a later patch introduce double as a separate change. But we
can't have things that are logically part of patch #2 just tossed in
with patch #1.

I was in the middle of ripping that out of the patch when I realized
that evalFunc() is pretty badly designed. What you've done is made it
the job of each particular function to evaluate its arguments. I
don't think that's a good plan. I think that when we discover that
we've got a function, we should evaluate all of the arguments that
were passed to it using common code that is shared across all types of
functions and operators. Then, the evaluated arguments should be
passed to the function-specific code, which can do as it likes with
them. This way, you have less code that is specific to particular
operations and more common code, which is generally a good thing.
Every expression evaluation engine that I've ever heard of works this
way - see, for example, the PostgreSQL backend.

I experimented with trying to do this and ran into a problem: where
exactly would you store the evaluated arguments when you don't know
how many of them there will be? And even if you did know how many of
them there will be, wouldn't that mean that evalFunc or evaluateExpr
would have to palloc a buffer of the correct size for each invocation?
That's far more heavyweight than the current implementation, and
minimizing CPU usage inside pgbench is a concern. It would be
interesting to do some pgbench runs with this patch, or the final
patch, and see what effect it has on the TPS numbers, if any, and I
think we should. But the first concern is to minimize any negative
impact, so let's talk about how to do that.

I think we should limit the number of arguments to a function to, say,
16, so that an array of int64s or PgBenchValues long enough to hold an
entire argument list can be stack-allocated. The backend's limit is
higher, but the only reason we need a value higher than 2 here is
because you've chosen to introduce variable-argument functions; but I
think 16 is enough for any practical purpose. Then, I think we should
also change the parse tree representation so that transforms the
linked-list into an array stored within the PgBenchExpr, so that you
can access the expression for argument i as expr->u.arg[i]. Then we
can write this is a loop that evaluates each expression in an array of
expressions and stores the result in an array of values. That seems
like it would be both cleaner and faster than what you've got here
right now. It's also more similar to what you did with the function
name itself: the most trivial thing the parser could do is store a
pointer to the function or operator name, but that would be slow, so
instead it looks up the function and stores a constant.

I also went over your documentation changes. I think you inserted the
new table smack dab in the middle of a section in a place that it
doesn't really belong. The version attached makes it into its own
<refsect2>, puts it in a new section a bit further down so that it
doesn't break up the flow, and has a few other tweaks that I think are
improvements.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment Content-Type Size
pgbench-doc.patch application/x-patch 3.0 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Langote 2016-02-16 00:28:23 Re: Declarative partitioning
Previous Message Tatsuo Ishii 2016-02-16 00:09:52 Re: planstats.sgml