fixed tuple descs (was JIT compiling expressions/deform)

From: Andres Freund <andres(at)anarazel(dot)de>
To: pgsql-hackers(at)postgresql(dot)org
Subject: fixed tuple descs (was JIT compiling expressions/deform)
Date: 2017-12-06 09:37:17
Message-ID: 20171206093717.vqdxe5icqttpxs3p@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

One part of the work to make JITing worth it's while is JITing tuple
deforming. That's currently often the biggest consumer of time, and if not
most often in the top entries.

My experimentation shows that tuple deforming is primarily beneficial
when it happens as *part* of jit compiling expressions. I'd originally
tried to jit compile deforming inside heaptuple.c, and cache the
deforming program inside the tuple slot. That turns out to not work very
well, because a lot of tuple descriptors are very short lived, computed
during ExecInitNode(). Even if that were not the case, compiling for
each deforming on demand has significant downsides:
- it requires emitting code in smaller increments (whenever something
new is deformed)
- because the generated code has to be generic for all potential
deformers, the number of branches to check for that are
significant. If instead the the deforming code is generated for a
specific callsite, no branches for the number of to-be-deformed
columns has to be generated. The primary remaining branches then are
the ones checking for NULLs and the number of attributes in the
column, and those can often be optimized away if there's NOT NULL
columns present.
- the call overhead is still noticeable
- the memory / function lifetime management is awkward.

If the JITing of expressions is instead done as part of expression
evaluation we can emit all the necessary code for the whole plantree
during executor startup, in one go. And, more importantly, LLVMs
optimizer is free to inline the deforming code into the expression code,
often yielding noticeable improvements (although that still could use
some improvements).

To allow doing JITing at ExecReadyExpr() time, we need to know the tuple
descriptor a EEOP_{INNER,OUTER,SCAN}_FETCHSOME step refers to. There's
currently two major impediments to that.

1) At a lot of ExecInitExpr() callsites the tupledescs for inner, outer,
scan aren't yet known. Therefore that code needs to be reordered so
we (if applicable):
a) initialize subsidiary nodes, thereby determining the left/right
(inner/outer) tupledescs
b) initialize the scan tuple desc, often that refers to a)
c) determine the result tuple desc, required to build the projection
d) build projections
e) build expressions

Attached is a patch doing so. Currently it only applies with a few
preliminary patches applied, but that could be easily reordered.

The patch is relatively large, as I decided to try to get the
different ExecInitNode functions to look a bit more similar. There's
some judgement calls involved, but I think the result looks a good
bit better, regardless of the later need.

I'm not really happy with the, preexisting, split of functions
between execScan.c, execTuples.c, execUtils.c. I wonder if the
majority, except the low level slot ones, shouldn't be moved to
execUtils.c, I think that'd be clearer. There seems to be no
justification for execScan.c to contain
ExecAssignScanProjectionInfo[WithVarno].

2) TupleSlots need to describe whether they'll contain a fixed tupledesc
for all their lifetime, or whether they can change their nature. Most
places don't need to ever change a slot's identity, but in a few
places it's quite convenient.

I've introduced the notion that a tupledesc can be marked as "fixed",
by passing a tupledesc at its creation. That also gains a bit of
efficiency (memory management overhead, higher cache hit ratio)
because the slot, tts_values, tts_isnull can be allocated in one
chunk.

3) At expression initialization time we need to figure out what slots
(or just descs INNER/OUTER/SCAN refer to. I've solved that by looking
up inner/outer/scan via the provided parent node, which required
adding a new field to store the scan slot.

Currently no expressions initialized with a parent node have a
INNER/OUTER/SCAN slot + desc that doesn't refer to the relevant node,
but I'm not sure I like that as a requirement.

Attached is a patch that implements 1 + 2. I'd welcome a quick look
through it. It currently only applies ontop a few other recently
submitted patches, but it'd just be an hour's work or so to reorder
that.

Comments about either the outline above or the patch?

Regards,

Andres

Attachment Content-Type Size
0001-WIP-Allow-tupleslots-to-have-a-fixed-tupledesc-use-i.patch text/x-diff 94.1 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Khandekar 2017-12-06 10:01:44 Re: pgsql: Support Parallel Append plan nodes.
Previous Message Amit Kapila 2017-12-06 09:16:16 Re: es_query_dsa is broken