Re: WIP: Faster Expression Processing v4

From: Andres Freund <andres(at)anarazel(dot)de>
To: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Cc: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: WIP: Faster Expression Processing v4
Date: 2017-03-14 17:31:37
Message-ID: 20170314173137.5r5jtgkgx6siuwsy@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2017-03-14 16:58:54 +0200, Heikki Linnakangas wrote:
> On 03/14/2017 08:53 AM, Andres Freund wrote:
> > Besides that, this version has:
> > - pgindented most of the affected pieces (i.e. all significant new code
> > has been reindent, not all touched one)
>
> I think you'll need to add all the inner structs ExprEvalStep typedefs.list
> to indent them right.

Yea, I saw that. Since they're not named atm, That'd probably not work
well. I'd be inclined to just live with it :/

> I looked at patch 0004.

Thanks!

> * EEO_DISPATCH_DIRECT(op) takes 'op' as an argument, but it really assumes
> that 'op' has already been set to point to the jump target. I find that a
> bit weird. I guess the idea is that you always pass the Program Counter
> variable as 'op' argument. For consistency, would be good if EEO_SWITCH()
> also took just 'op' as the argument, rather than op->opcode. But I think it
> would be more clear if they should both just assumed that there's a variable
> called 'op' that points to the current instruction.

Hm. I dislike magic variable names. I think I prefer your idea of
passing the op entirely to EEO_SWITCH.

> * All the callers of EEO_DISPATCH_DIRECT(op) set 'op' just prior to calling
> EEO_DISPATCH_DIRECT(op). How about having a macro EEO_JUMP(<step number>),
> to encapsulate setting 'op' and jumping to it in one operation?

Ok.

> * ExecEvalStepOp() seems relatively expensive, with the linear scan over all
> the opcodes, if called on an ExprState that already has
> EEO_FLAG_JUMP_THREADED set. All the callers use it to check if the opcode is
> a particular one, so you could check if the opcode matches that particular
> one, instead of scanning the dispatch table to find what it is.

It should be fairly cheap at that place, unless the same expression is
instantiated twice for some reason. I have been wondering about adding
a second table ptr->op that can be binary searched for the operation to
make it cheaper.

> * But is ExecEvalStepOp() ever actually get called on an expression with
> EEO_FLAG_JUMP_THREADED already set? It's only used in
> ExecInstantiateInterpretedExpr(), and it's called only once on each
> ExprState. How about just adding an Assert that EEO_FLAG_JUMP_THREADED is
> not yet set? Or drop the EEO_FLAG_JUMP_THREADED flag altogether, and assert
> that evalfunc != ExecInterpExpr.

The primary use of ExecEvalStepOp() is really debugging, so I'd like to
avoid adding that restriction. I guess we can add a second function for
this if needed.

> * How tight are we on space in the ExprEvalStep union? Currently, the
> jump-threading preparation replaces the opcodes with the goto labels, but it
> would be really nice to keep the original opcodes, for debugging purposes if
> nothing else.

There's nothing left to spare :(. For debugging I've just used
ExecEvalStepOp() - it's inconvenient to directly print the struct
anyway, since gdb prints the whole union...

> * execInterpExpr.c is quite a mouthful. How about exprInterpreter.c?

> Attached is a patch, on top of your
> 0004-Faster-expression-evaluation-and-targetlist-projecti.patch, with some
> minor tweaks and comments here and there (search for HEIKKI). It's mostly
> the same stuff I listed above, implemented in a quick & dirty way. I hope
> it's helpful.

Thanks!

> typedef enum ExprEvalOp
> {
> + /*
> + * HEIKKI: How about renaming these to EEOP_* ? I think it'd be a
> + * good mnemonic to remember that these are opcodes, and just one
> + * extra letter.
> + */

I don't mind the change.

> + /* HEIKKI: Is it safe to assume that sizeof(size_t) >= sizeof(void *) ? */

Yes, seems very likely - it's supposed to hold any potential sizeof()
return value. You could end up on platforms where that's not true, if
it used tagged pointers or such. I guess we could instead use a union,
but that doesn't seem that much of a benefit.

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2017-03-14 17:34:10 Re: WIP: Faster Expression Processing v4
Previous Message Robert Haas 2017-03-14 17:29:14 Re: Write Ahead Logging for Hash Indexes