Re: pg_basebackup: removed an unnecessary use of memset in FindStreamingStart

From: Chao Li <li(dot)evan(dot)chao(at)gmail(dot)com>
To: yangyz <1197620467(at)qq(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: pg_basebackup: removed an unnecessary use of memset in FindStreamingStart
Date: 2026-02-25 08:41:49
Message-ID: CD9F4C02-D1A0-4298-8E18-9EB14DF4DD8A@gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> On Feb 25, 2026, at 14:31, yangyz <1197620467(at)qq(dot)com> wrote:
>
> Hi Hackers,
>
> When I read the FindStreamingStart function in pg_receivewal.c, I discovered an unnecessary use of memset.So I removed it, optimizing the performance without affecting its functionality.
>
> The following is the detailed analysis of the reasons:
> 1.LZ4F_decompress will fully overwrite the output buffer:
> When out_size is passed as an input parameter, it denotes the size of the output buffer (outbuf). The decompression operation writes the decompressed data to outbuf. Upon function return, out_size is updated to reflect the actual number of bytes written. Notably, even in cases of partial decompression, data is written starting from the initial position of outbuf.
> 2.Performance Overhead
> In each iteration, the entire buffer of size LZ4_CHUNK_SZ (potentially several megabytes) is zero-initialized. Since these memory blocks are immediately overwritten by decompressed data, this zeroing operation constitutes an unnecessary consumption of CPU resources.
>
> Regards,
> Yang Yuanzhuo
>
>
>
> <v1-0001-Removed-an-unnecessary-use-of-memset-in-FindStrea.patch>

Looking at the code snippet:
```
while (readp < readend)
{
size_t out_size = LZ4_CHUNK_SZ;
size_t read_size = readend - readp;

memset(outbuf, 0, LZ4_CHUNK_SZ);
status = LZ4F_decompress(ctx, outbuf, &out_size,
readp, &read_size, &dec_opt);
if (LZ4F_isError(status))
pg_fatal("could not decompress file \"%s\": %s",
fullpath,
LZ4F_getErrorName(status));

readp += read_size;
uncompressed_size += out_size;
}
```
It’s trying to locate the start position, and the decoded bytes are not consumed (they’re effectively discarded). Given that LZ4F_decompress() reports the produced size via out_size, zeroing the whole output buffer beforehand doesn’t seem necessary here. Since this happens inside the loop, the extra memset() just amplifies the overhead.

Also, ReadDataFromArchiveLZ4() has a very similar loop that doesn’t zero the output buffer at all:
```
while (readp < readend)
{
size_t out_size = DEFAULT_IO_BUFFER_SIZE;
size_t read_size = readend - readp;

status = LZ4F_decompress(ctx, outbuf, &out_size,
readp, &read_size, &dec_opt);
if (LZ4F_isError(status))
pg_fatal("could not decompress: %s",
LZ4F_getErrorName(status));

ahwrite(outbuf, 1, out_size, AH);
readp += read_size;
}
```

So +1 for removing the memset.

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Antonin Houska 2026-02-25 08:55:59 Re: Adding REPACK [concurrently]
Previous Message Ashutosh Bapat 2026-02-25 08:28:21 Re: SQL Property Graph Queries (SQL/PGQ)