VALUES nodes and expression initialization

From: Andres Freund <andres(at)anarazel(dot)de>
To: pgsql-hackers(at)postgresql(dot)org
Subject: VALUES nodes and expression initialization
Date: 2018-01-29 21:46:52
Message-ID: 20180129214652.pqr23ykzemohpkpw@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

In contrast to most other nodes, nodeValuescan.c does expression
initialization at "runtime" rather than in initialization:

/*
* Get rid of any prior cycle's leftovers. We use ReScanExprContext
* not just ResetExprContext because we want any registered shutdown
* callbacks to be called.
*/
ReScanExprContext(econtext);

/*
* Build the expression eval state in the econtext's per-tuple memory.
* This is a tad unusual, but we want to delete the eval state again
* when we move to the next row, to avoid growth of memory
* requirements over a long values list.
*/
oldContext = MemoryContextSwitchTo(econtext->ecxt_per_tuple_memory);

this does make a good bit of sense for things like
INSERT ... VALUES (...), (...), ...
or even just a plain long
VALUES (...), (...), ...
statement (albeit the latter being a bit pointless).

but when *joining* something with a VALUES(), that's far less
beneficial. That can lead to the same VALUES() node being scanned over
and over, doing a lot of redundant initialization.

I noticed this when JITing the regression tests, and forcing every
single expression to be JITed. So maybe, just maybe, not the most
relevant case.

But it's not particularly hard to think of scenarios where that leads to
spending a good chunk of time in expression initialization, even leading
JIT aside.

The JIT case is easy enough to fix / work around, I'm however wondering
if we should do something about the repeated scan case. It's a
reasonably common pattern to join to VALUES to e.g. add additional
information to the results of an aggregate.

The case that trgiggered me looking at this is:
regression[17061][1]=# explain select count(*) from tenk1 a, tenk1 b join lateral (values(a.unique1),(-1)) ss(x) on b.unique2 = ss.x;
┌─────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ QUERY PLAN │
├─────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ Aggregate (cost=1437.69..1437.70 rows=1 width=8) │
│ -> Hash Join (cost=394.91..1387.81 rows=19952 width=0) │
│ Hash Cond: ("*VALUES*".column1 = b.unique2) │
│ -> Nested Loop (cost=0.29..718.84 rows=19952 width=4) │
│ -> Index Only Scan using tenk1_unique1 on tenk1 a (cost=0.29..269.93 rows=9976 width=4) │
│ -> Values Scan on "*VALUES*" (cost=0.00..0.03 rows=2 width=4) │
│ -> Hash (cost=269.93..269.93 rows=9976 width=4) │
│ -> Index Only Scan using tenk1_unique2 on tenk1 b (cost=0.29..269.93 rows=9976 width=4) │
└─────────────────────────────────────────────────────────────────────────────────────────────────────────┘
(8 rows)

Here materialization doesn't help, because there's a lateral dependency
preventing VALUES from materializing.

Does anybody think such cases are common enough that we should do
something about the constant re-initialization?

Greetings,

Andres Freund

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2018-01-29 21:51:38 Re: JIT compiling with LLVM v9.0
Previous Message Rick Otten 2018-01-29 21:35:53 Re: dsa_allocate() faliure