Re: POC: Sharing record typmods between backends

From: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: POC: Sharing record typmods between backends
Date: 2017-07-10 09:39:09
Message-ID: CAEepm=1Z+GEAg=LaKPe02FmzWsLGMS14wkcT=EyQ129hv5xPxg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Jun 1, 2017 at 6:29 AM, Andres Freund <andres(at)anarazel(dot)de> wrote:
> On May 31, 2017 11:28:18 AM PDT, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>>> On 2017-05-31 13:27:28 -0400, Dilip Kumar wrote:
[ ... various discussion in support of using DHT ... ]

Ok, good.

Here's a new version that introduces a per-session DSM segment to hold
the shared record typmod registry (and maybe more things later). The
per-session segment is created the first time you run a parallel query
(though there is handling for failure to allocate that allows the
parallel query to continue with no workers) and lives until your
leader backend exits. When parallel workers start up, they see its
handle in the per-query segment and attach to it, which puts
typcache.c into write-through cache mode so their idea of record
typmods stays in sync with the leader (and each other).

I also noticed that I could delete even more of tqueue.c than before:
it doesn't seem to have any remaining reason to need to know the
TupleDesc.

One way to test this code is to apply just
0003-rip-out-tqueue-remapping-v3.patch and then try the example from
the first message in this thread to see it break, and then try again
with the other two patches applied. By adding debugging trace you can
see that the worker pushes a bunch of TupleDescs into shmem, they get
pulled out by the leader when it sees the tuples, and then on a second
invocation the (new) worker can reuse them: it finds matches already
in shmem from the first invocation.

I used a DSM segment with a TOC and a DSA area inside that, like the
existing per-query DSM segment, but obviously you could spin it
various different ways. One example: just have a DSA area and make a
new kind of TOC thing that deals in dsa_pointers. Better ideas?

I believe combo CIDs should also go in there, to enable parallel
write, but I'm not 100% sure: that's neither per-session nor per-query
data, that's per-transaction. So perhaps the per-session DSM could
hold a per-session DSA and a per-transaction DSA, where the latter is
reset for each transaction, just like TopTransactionContext (though
dsa.c doesn't have a 'reset thyself' function currently). That seems
like a good place to store a shared combo CID hash table using DHT.
Thoughts?

--
Thomas Munro
http://www.enterprisedb.com

Attachment Content-Type Size
shared-record-typmod-registry-v3.patchset.tgz application/x-gzip 32.0 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kyotaro HORIGUCHI 2017-07-10 09:57:40 Re: hash index on unlogged tables doesn't behave as expected
Previous Message Heikki Linnakangas 2017-07-10 09:30:47 Re: List of hostaddrs not supported