POC: Sharing record typmods between backends

From: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
To: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: POC: Sharing record typmods between backends
Date: 2017-04-07 05:21:35
Message-ID: CAEepm=0ZtQ-SpsgCyzzYpsXS6e=kZWqk3g5Ygn3MDV7A8dabUA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi hackers,

Tuples can have type RECORDOID and a typmod number that identifies a
"blessed" TupleDesc in a backend-private cache. To support the
sharing of such tuples through shared memory and temporary files, I
think we need a typmod registry in shared memory. Here's a
proof-of-concept patch for discussion. I'd be grateful for any
feedback and/or flames.

This is a problem I ran into in my parallel hash join project. Robert
pointed it out to me and told me to go read tqueue.c for details, and
my first reaction was: I'll code around this by teaching the planner
to avoid sharing tuples from paths that produce transient record types
based on tlist analysis[1]. Aside from being a cop-out, that approach
doesn't work because the planner doesn't actually know what types the
executor might come up with since some amount of substitution for
structurally-similar records seems to be allowed[2] (though I'm not
sure I can explain that). So... we're gonna need a bigger boat.

The patch uses typcache.c's backend-private cache still, but if the
backend is currently "attached" to a shared registry then it functions
as a write though cache. There is no cache-invalidation problem
because registered typmods are never unregistered. parallel.c exports
the leader's existing record typmods into a shared registry, and
attaches to it in workers. A DSM detach hook returns backends to
private cache mode when parallelism ends.

Some thoughts:

* Maybe it would be better to have just one DSA area, rather than the
one controlled by execParallel.c (for executor nodes to use) and this
new one controlled by parallel.c (for the ParallelContext). Those
scopes are approximately the same at least in the parallel query case,
but...

* It would be nice for the SharedRecordTypeRegistry to be able to
survive longer than a single parallel query, perhaps in a per-session
DSM segment. Perhaps eventually we will want to consider a
query-scoped area, a transaction-scoped area and a session-scoped
area? I didn't investigate that for this POC.

* It seemed to be a reasonable goal to avoid allocating an extra DSM
segment for every parallel query, so the new DSA area is created
in-place. 192KB turns out to be enough to hold an empty
SharedRecordTypmodRegistry due to dsa.c's superblock allocation scheme
(that's two 64KB size class superblocks + some DSA control
information). It'll create a new DSM segment as soon as you start
using blessed records, and will do so for every parallel query you
start from then on with the same backend. Erm, maybe adding 192KB to
every parallel query DSM segment won't be popular...

* Perhaps simplehash + an LWLock would be better than dht, but I
haven't looked into that. Can it be convinced to work in DSA memory
and to grow on demand?

Here's one way to hit the new code path, so that record types blessed
in a worker are accessed from the leader:

CREATE TABLE foo AS SELECT generate_series(1, 10) AS x;
CREATE OR REPLACE FUNCTION make_record(n int)
RETURNS RECORD LANGUAGE plpgsql PARALLEL SAFE AS
$$
BEGIN
RETURN CASE n
WHEN 1 THEN ROW(1)
WHEN 2 THEN ROW(1, 2)
WHEN 3 THEN ROW(1, 2, 3)
WHEN 4 THEN ROW(1, 2, 3, 4)
ELSE ROW(1, 2, 3, 4, 5)
END;
END;
$$;
SET force_parallel_mode = 1;
SELECT make_record(x) FROM foo;

PATCH

1. Apply dht-v3.patch[3].
2. Apply shared-record-typmod-registry-v1.patch.
3. Apply rip-out-tqueue-remapping-v1.patch.

[1] https://www.postgresql.org/message-id/CAEepm%3D2%2Bzf7L_-eZ5hPW5%3DUS%2Butdo%3D9tMVD4wt7ZSM-uOoSxWg%40mail.gmail.com
[2] https://www.postgresql.org/message-id/CA+TgmoZMH6mJyXX=YLSOvJ8jULFqGgXWZCr_rbkc1nJ+177VSQ@mail.gmail.com
[3] https://www.postgresql.org/message-id/flat/CAEepm%3D3d8o8XdVwYT6O%3DbHKsKAM2pu2D6sV1S_%3D4d%2BjStVCE7w%40mail.gmail.com

--
Thomas Munro
http://www.enterprisedb.com

Attachment Content-Type Size
rip-out-tqueue-remapping-v1.patch application/octet-stream 38.3 KB
shared-record-typmod-registry-v1.patch application/octet-stream 35.9 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Vitaly Burovoy 2017-04-07 05:26:16 Re: identity columns
Previous Message Noah Misch 2017-04-07 05:21:10 Re: SCRAM authentication, take three