Re: Redundant tuple copy in tqueueReceiveSlot()

From: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
To: Amit Khandekar <amitdkhan(dot)pg(at)gmail(dot)com>
Cc: Soumyadeep Chakraborty <soumyadeep2007(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Redundant tuple copy in tqueueReceiveSlot()
Date: 2020-09-17 02:20:50
Message-ID: CA+hUKGL--+RFMbQXcoFpvf1Yx2qn_FjCHWM0jJEGrKBAG=pYZQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Sep 9, 2020 at 5:23 PM Amit Khandekar <amitdkhan(dot)pg(at)gmail(dot)com> wrote:
> I went ahead and tried doing this. I chose an approach where we can
> return the pointer to the in-place minimal tuple data if it's a
> heap/buffer/minimal tuple slot. A new function
> ExecFetchSlotMinimalTupleData() returns in-place minimal tuple data.
> If it's neither heap, buffer or minimal tuple, it returns a copy as
> usual. The receiver should not assume the data is directly taken from
> MinimalTupleData, so it should set it's t_len to the number of bytes
> returned. Patch attached
> (0001-Avoid-redundant-tuple-copy-while-sending-tuples-to-G.patch)

+char *
+ExecFetchSlotMinimalTupleData(TupleTableSlot *slot, uint32 *len,
+ bool *shouldFree)

Interesting approach. It's a bit of a weird interface, returning a
pointer to a non-valid MinimalTuple that requires extra tweaking after
you copy it to make it a valid one and that you're not allowed to
tweak in-place. I'd probably make that return "const void *" just for
a bit of extra documentation. I wonder if there is a way we could
make "Minimal Tuples but with the length travelling separately (and
perhaps chopped off?)" into a first-class concept... It's also a
shame to be schlepping a bunch of padding bytes around.

tuple = (MinimalTuple) data;
- Assert(tuple->t_len == nbytes);
+ tuple->t_len = nbytes;

Hmm, so you have to scribble on shared memory on the receiving side.
I wondered about a couple of different ways to share the length field
with the shm_mq envelope, but that all seems a bit too weird...

> Thomas, I guess you had a different approach in mind when you said
> about "returning either success or
> hey-that-buffer's-too-small-I-need-N-bytes". But what looks clear to

Yeah I tried some things like that, but I wasn't satisfied with any of
them; basically the extra work involved in negotiating the size was a
bit too high. On the other hand, because my interface was "please
write a MinimalTuple here!", it had the option to *form* a
MinimalTuple directly in place, whereas your approach can only avoid
creating and destroying a temporary tuple when the source is a heap
tuple.

> me is that avoiding the copy shows consistent improvement of 4 to 10%
> for simple parallel table scans. I tried my patch on both x86_64 and
> arm64, and observed this speedup on both.

I think that's a great validation of the goal but I hope we can figure
out a way that avoids the temporary tuple for more cases. FWIW I saw
hash self joins running a couple of percent faster with one of my
abandoned patches.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Ashutosh Sharma 2020-09-17 02:27:27 Re: recovering from "found xmin ... from before relfrozenxid ..."
Previous Message Tom Lane 2020-09-17 02:03:57 Re: pg_restore causing deadlocks on partitioned tables