Quick Links

Re: Memory leak in XLOG reader facility when decoding records (caused by WAL refactoring)

From:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To:	Andres Freund <andres(at)anarazel(dot)de>
Cc:	PostgreSQL mailing lists <pgsql-bugs(at)postgresql(dot)org>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Subject:	Re: Memory leak in XLOG reader facility when decoding records (caused by WAL refactoring)
Date:	2015-07-27 01:40:47
Message-ID:	CAB7nPqRXABT05BxnVNHY+s_DabqJpOm0CcZ99+CVZ6v_Zy6x8Q@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-bugs

On Mon, Jul 27, 2015 at 12:32 AM, Andres Freund <andres(at)anarazel(dot)de> wrote:
> On 2015-07-23 22:50:12 +0900, Michael Paquier wrote:
>>
>> While running valgring on pg_rewind, I have noticed that each new call
>> to XLogReadRecord leaks XLogReaderState->main_data and
>> XLogReaderState->blocks[block_id].data because each one of them is
>> palloc'd to store respectively the main data of the record and the
>> data attached to each block. I think that the most correct fix to
>> prevent the leak is to free those pointers when calling ResetDecoder()
>> which is used to reset a reader state between two records.
>
> I don't think that'd be a good fix. Isn't the idea here that we only
> allocate new buffers when the previous one is too small? Your fix will
> greatly increase the memory management overhead.

OK.

>> ==46805== 154 bytes in 1 blocks are definitely lost in loss record 103
of 167
>> ==46805== at 0x6B1B: malloc (in
>>
/usr/local/Cellar/valgrind/3.10.1/lib/valgrind/vgpreload_memcheck-amd64-darwin.so)
>> ==46805== by 0x10000DB2D: pg_malloc_internal (fe_memutils.c:30)
>> ==46805== by 0x10000DE79: palloc (fe_memutils.c:118)
>> ==46805== by 0x1000052A0: DecodeXLogRecord (xlogreader.c:1203)
>> ==46805== by 0x100003AA9: XLogReadRecord (xlogreader.c:461)
>> ==46805== by 0x1000022F0: extractPageMap (parsexlog.c:78)
>> ==46805== by 0x10000188C: main (pg_rewind.c:280)
>
> Uh. This is the leak?
> if (state->main_data_len > 0)
> {
> if (!state->main_data || state->main_data_len >
state->main_data_bufsz)
> {
> if (state->main_data)
> pfree(state->main_data);
> state->main_data_bufsz = state->main_data_len;
> state->main_data =
palloc(state->main_data_bufsz); <--- here
> }
> memcpy(state->main_data, ptr, state->main_data_len);
> ptr += state->main_data_len;
> }
>
> Where/When are we leaking memory here? The previously used memory is
> freed before we allocate a larger buffer.

Yep, it is here, and it is not related to the calls of XLogReadRecord
actually, that's a problem in XLogReaderFree that relies on max_block_id,
which gets reinitialized to -1 between two records in ResetDecoder, instead
of XLR_MAX_BLOCK_ID to clean up the data blocks when freeing the XLOG
reader (note: looking at memory leaks at night can lead to incorrect
conclusions). So for example if a first record uses some data blocks and
allocates buffers for some data, and then a second record uses no data
blocks, we'll leak. So that's less critical than I though as the couple
XLofReaderAllocate/XLogReaderFree is called only once in StartupXLOG, still
that's a leak.

Attached is a self-contained script able to show the problem if anybody
wants to check by himself (the other leaks related to pg_rewind are
addressed in another thread), and a patch that takes care of it.
Regards,
--
Michael

Attachment	Content-Type	Size
rewind_test.bash	application/octet-stream	1.5 KB
20150727_xlogreader_leak.patch	binary/octet-stream	698 bytes

In response to

Re: Memory leak in XLOG reader facility when decoding records (caused by WAL refactoring) at 2015-07-26 15:32:48 from Andres Freund

Responses

Re: Memory leak in XLOG reader facility when decoding records (caused by WAL refactoring) at 2015-07-27 17:32:42 from Heikki Linnakangas

Browse pgsql-bugs by date

	From	Date	Subject
Next Message	evgenyfr	2015-07-27 07:10:48	BUG #13519: psql: SSPI continuation error
Previous Message	Andres Freund	2015-07-26 15:32:48	Re: Memory leak in XLOG reader facility when decoding records (caused by WAL refactoring)