Re: extend pgbench expressions with functions

From: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: PostgreSQL Developers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: extend pgbench expressions with functions
Date: 2015-11-05 20:33:15
Message-ID: alpine.DEB.2.10.1511052018410.8244@sto
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


Hello Robert,

>>> 2. ddebug and idebug seem like a lousy idea to me.
>>
>> It was really useful to me for debugging and testing.
>
> That doesn't mean it belongs in the final patch.

I think it is useful when debugging a script, not just for debugging the
evaluation code itself.

>>> 3. I'm perplexed by why you've replaced evaluateExpr() with evalInt()
>>> and evalDouble().
>>
>> Basically, the code is significantly shorter and elegant with this option.
>
> I find that pretty hard to swallow.

Hmmm, maybe with some more explanations?

> If the backend took this approach,

Sure, I would never suggest to do anything like that in the backend!

> we've have a separate evaluate function for every datatype, which would
> make it completely impossible to support the creation of new datatypes.

In the backend implementation is generic about types (one can add a type
dynamically in the system), which is another abstraction level, it does
not compare in any way.

> And I find this code to be quite difficult to follow.

It is really the function *you* wrote, and there is just one version for
int and one for double.

> What I think we should have is a type like PgBenchValue that represents
> a value which may be an integer or a double or whatever else we want to
> support, but not an expression - specifically a value. Then when you
> invoke a function or an operator it gets passed one or more PgBenchValue
> objects and can do whatever it wants with those, storing the result into
> an output PgBenchValue. So add might look like this:

Sure, I do know what it looks like, and I want to avoid it, because this
is just a lot of ugly code which is useless for pgbench purpose.

> void
> add(PgBenchValue *result, PgBenchValue *x, PgBenchValue *y)
> {
> if (x->type == PGBT_INTEGER && y->type == PGBT_INTEGER)
> {
> result->type = PGBT_INTEGER;
> result->u.ival = x->u.ival + y->u.ival;
> return;
> }
> if (x->type == PGBT_DOUBLE && y->type == PGBT_DOUBLE)
> {
> result->type = PGBT_DOUBLE;
> result->u.ival = x->u.dval + y->u.dval;
> return;
> }
> /* cross-type cases, if desired, go here */

Why reject 1 + 2.0 ? So the cross-type cases are really required for user
sanity, which adds:

if (x->type == PGBT_DOUBLE && y->type == PGBT_INTEGER)
{
result->type = PGBT_DOUBLE;
result->u.ival = x->u.dval + y->u.ival;
return;
}
if (x->type == PGBT_INTEGER && y->type == PGBT_DOUBLE)
{
result->type = PGBT_DOUBLE;
result->u.ival = x->u.ival + y->u.dval;
return;
}
> }

For the '+' overload operator with conversions there are 4 cases (2
arguments ** 2 types) to handle. For all 5 binary operators (+ - * / %).
that makes 20 cases to handle. Then for every function, you have to deal
with type conversion as well, each function times number of arguments **
number of types. That is 2 cases for abs, 4 cases for random, 8 cases for
each random_exp*, 8 for random_gaus*, and so on. Some thinking would be
required for n-ary functions (min & max).

Basically, it can be done, no technical issue, it is just a matter of
writing a *lot* of repetitive code, hundreds of lines of them. As I think
it does not bring any value for pgbench purpose, I used the other approach
which reduces the code size by avoiding the combinatorial "cross-type"
conversions.

> The way you have it, the logic for every function and operator exists
> in two copies: one in the integer function, and the other in the
> double function.

Yep, but only 2 cases to handle, instead of 4 cases in the above example,
that is the point.

> As soon as we add another data type - strings, dates, whatever - we'd
> need to have cases for all of these things in those functions as well,
> and every time we add a function, we have to update all the case
> statements for every datatype's evaluation function. That's just not a
> scalable model.

On the contrary, it is reasonably scalable:

With the eval-per-type for every added type one should implement one eval
function which handles all functions and operators, each eval function
roughly the same size as the current ones.

With the above approach, the overloaded add function which handles 2
operands with 3 types ('post' + 'gres' -> 'postgres') would have to deal
with 2**3 = 8 types combinations instead of 4, so basically it would be
doubling the code size. If you add dates on top of that, 2**4 = 16 cases
just for one operator. No difficulty there, just a lot of lines...

Basically the code size complexity of the above approach is:

#functions * (#args ** #types)

while for the approach in the submitted patch it is:

#types * #functions

Now I obviously agree that the approach is not generic and should not be
used in other contexts, it is just that it is simple/short and it fits the
bill for pgbench.

Note that I do not really envision adding more types for pgbench scripts.
The double type is just a side effect of the non uniform randoms. What I
plan is to add a few functions, especially a pseudo-random permutation
and/or a parametric hash, that would allow running more realistic tests
with non uniform distributions.

--
Fabien.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2015-11-05 20:39:09 Brain fade in gin_extract_jsonb_path()
Previous Message Pavel Stehule 2015-11-05 20:29:30 Re: Note about comparation PL/SQL packages and our schema/extensions