Re: Making aggregate deserialization (and WAL receive) functions slightly faster

From: David Rowley <dgrowleyml(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Making aggregate deserialization (and WAL receive) functions slightly faster
Date: 2023-10-25 01:03:17
Message-ID: CAApHDvqi_L68q7+3XRzULfXQNd9g7Hod+n0y8oCkuSrP64uZCA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, 17 Oct 2023 at 20:39, David Rowley <dgrowleyml(at)gmail(dot)com> wrote:
>
> On Mon, 16 Oct 2023 at 05:56, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> > I guess the next question is whether we want to stop here or
> > try to relax the requirement about NUL-termination. I'd be inclined
> > to call that a separate issue deserving a separate commit, so maybe
> > we should go ahead and commit this much anyway.
>
> I am keen to see this relaxed. I agree that a separate effort is best.

I looked at the latest posted patch again today with thoughts about
pushing it but there's something I'm a bit unhappy with that makes me
think we should maybe do the NUL-termination relaxation in the same
commit.

The problem is in LogicalRepApplyLoop() the current patch adjusts the
manual building of the StringInfoData to make use of
initReadOnlyStringInfo() instead. The problem I have with that is that
the string that's given to initReadOnlyStringInfo() comes from
walrcv_receive() and on looking at the API spec for walrcv_receive_fn
I see:

/*
* walrcv_receive_fn
*
* Receive a message available from the WAL stream. 'buffer' is a pointer
* to a buffer holding the message received. Returns the length of the data,
* 0 if no data is available yet ('wait_fd' is a socket descriptor which can
* be waited on before a retry), and -1 if the cluster ended the COPY.
*/

i.e, no mention that the buffer will be NUL terminated upon return.

Looking at pqGetCopyData3(), is see the buffer does get NUL
terminated, but without the API spec mentioning this I'm not feeling
good about going ahead with wrapping that up in
initReadOnlyStringInfo() which Asserts the buffer will be NUL
terminated.

I've attached a patch which builds on the previous patch and relaxes
the rule that the StringInfo must be NUL-terminated. The rule is
only relaxed for StringInfos that are initialized with
initReadOnlyStringInfo. On working on this I went over the locations
where we've added code to add a '\0' char to the buffer. If you look
at, for example, record_recv() and array_agg_deserialize() in master,
we modify the StringInfo's data to set a \0 at the end of the string.
I've removed that code as I *believe* this isn't required for the
type's receive function.

There's also an existing confusing comment in logicalrep_read_tuple()
which seems to think we're just setting the NUL terminator to conform
to StringInfo's practises. This is misleading as the NUL is required
for LOGICALREP_COLUMN_TEXT mode as we use the type's input function
instead of the receive function. You don't have to look very hard to
find an input function that needs a NUL terminator.

I'm a bit less confident that the type's receive function will never
need to be NUL terminated. cstring_recv() came to mind as one I should
look at, but on looking I see it's not required as it just reads the
remaining bytes from the input StringInfo. Is it safe to assume this?
or could there be some UDF receive function which requires this?

David

Attachment Content-Type Size
stringinfo_changes_v5.patch text/plain 14.9 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Merlin Moncure 2023-10-25 01:07:20 Re: MERGE ... RETURNING
Previous Message Michael Paquier 2023-10-25 00:28:51 Re: Show version of OpenSSL in ./configure output