Re: Making aggregate deserialization (and WAL receive) functions slightly faster

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: David Rowley <dgrowleyml(at)gmail(dot)com>
Cc: PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Making aggregate deserialization (and WAL receive) functions slightly faster
Date: 2023-02-12 06:39:13
Message-ID: 770055.1676183953@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

David Rowley <dgrowleyml(at)gmail(dot)com> writes:
> While working on 16fd03e95, I noticed that in each aggregate
> deserialization function, in order to "receive" the bytea value that
> is the serialized aggregate state, appendBinaryStringInfo is used to
> append the bytes of the bytea value onto a temporary StringInfoData.
> Using appendBinaryStringInfo seems a bit wasteful here. We could
> really just fake up a StringInfoData and point directly to the bytes
> of the bytea value.

Perhaps, but ...

> The best way I could think of to do this was to invent
> initStringInfoFromString() which initialises a StringInfoData and has
> the ->data field point directly at the specified buffer. This will
> mean that it would be unsafe to do any appendStringInfo* operations on
> the resulting StringInfoData as enlargeStringInfo would try to
> repalloc the data buffer, which might not even point to a palloc'd
> string.

I find this patch horribly dangerous.

It could maybe be okay if we added the capability for StringInfoData
to understand (and enforce) that its "data" buffer is read-only.
However, that'd add overhead to every existing use-case.

> I've attached the benchmark results I got after testing how the
> modification changed the performance of string_agg_deserialize().
> I was hoping this would have a slightly more impressive performance
> impact, especially for string_agg() and array_agg() as the aggregate
> states of those can be large. However, in the test I ran, there's
> only a very slight performance gain. I may just not have found the
> best case, however.

I do not think we should even consider this without solid evidence
for *major* performance improvements. As it stands, it's a
quintessential example of a loaded foot-gun, and it seems clear
that making it safe enough to use would add more overhead than
it saves.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Ankit Kumar Pandey 2023-02-12 08:13:27 Re: Sort optimizations: Making in-memory sort cache-aware
Previous Message David Rowley 2023-02-12 05:38:36 Making aggregate deserialization (and WAL receive) functions slightly faster