Re: Memory leak in XLOG reader facility when decoding records (caused by WAL refactoring)

From: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Cc: Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL mailing lists <pgsql-bugs(at)postgresql(dot)org>
Subject: Re: Memory leak in XLOG reader facility when decoding records (caused by WAL refactoring)
Date: 2015-07-27 23:54:31
Message-ID: CAB7nPqTev+fS7GTVXw27e6O-kzrvnhg7Z88pUZzZGfdyu7WpnQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Tue, Jul 28, 2015 at 2:32 AM, Heikki Linnakangas <hlinnaka(at)iki(dot)fi> wrote:
> On 07/27/2015 04:40 AM, Michael Paquier wrote:
>>
>> On Mon, Jul 27, 2015 at 12:32 AM, Andres Freund <andres(at)anarazel(dot)de>
>> wrote:
>>>
>>> On 2015-07-23 22:50:12 +0900, Michael Paquier wrote:
>>>>
>>>>
>>>> While running valgring on pg_rewind, I have noticed that each new call
>>>> to XLogReadRecord leaks XLogReaderState->main_data and
>>>> XLogReaderState->blocks[block_id].data because each one of them is
>>>> palloc'd to store respectively the main data of the record and the
>>>> data attached to each block. I think that the most correct fix to
>>>> prevent the leak is to free those pointers when calling ResetDecoder()
>>>> which is used to reset a reader state between two records.
>>>
>>>
>>> I don't think that'd be a good fix. Isn't the idea here that we only
>>> allocate new buffers when the previous one is too small? Your fix will
>>> greatly increase the memory management overhead.
>>
>>
>> OK.
>>
>>>> ==46805== 154 bytes in 1 blocks are definitely lost in loss record 103
>>
>> of 167
>>>>
>>>> ==46805== at 0x6B1B: malloc (in
>>>>
>>
>> /usr/local/Cellar/valgrind/3.10.1/lib/valgrind/vgpreload_memcheck-amd64-darwin.so)
>>>>
>>>> ==46805== by 0x10000DB2D: pg_malloc_internal (fe_memutils.c:30)
>>>> ==46805== by 0x10000DE79: palloc (fe_memutils.c:118)
>>>> ==46805== by 0x1000052A0: DecodeXLogRecord (xlogreader.c:1203)
>>>> ==46805== by 0x100003AA9: XLogReadRecord (xlogreader.c:461)
>>>> ==46805== by 0x1000022F0: extractPageMap (parsexlog.c:78)
>>>> ==46805== by 0x10000188C: main (pg_rewind.c:280)
>>>
>>>
>>> Uh. This is the leak?
>>> if (state->main_data_len > 0)
>>> {
>>> if (!state->main_data || state->main_data_len >
>>
>> state->main_data_bufsz)
>>>
>>> {
>>> if (state->main_data)g
>>> pfree(state->main_data);
>>> state->main_data_bufsz = state->main_data_len;
>>> state->main_data =
>>
>> palloc(state->main_data_bufsz); <--- here
>>>
>>> }
>>> memcpy(state->main_data, ptr, state->main_data_len);
>>> ptr += state->main_data_len;
>>> }
>>>
>>> Where/When are we leaking memory here? The previously used memory is
>>> freed before we allocate a larger buffer.
>>
>>
>> Yep, it is here, and it is not related to the calls of XLogReadRecord
>> actually, that's a problem in XLogReaderFree that relies on max_block_id,
>> which gets reinitialized to -1 between two records in ResetDecoder,
>> instead
>> of XLR_MAX_BLOCK_ID to clean up the data blocks when freeing the XLOG
>> reader (note: looking at memory leaks at night can lead to incorrect
>> conclusions). So for example if a first record uses some data blocks and
>> allocates buffers for some data, and then a second record uses no data
>> blocks, we'll leak. So that's less critical than I though as the couple
>> XLofReaderAllocate/XLogReaderFree is called only once in StartupXLOG,
>> still
>> that's a leak.
>>
>> Attached is a self-contained script able to show the problem if anybody
>> wants to check by himself (the other leaks related to pg_rewind are
>> addressed in another thread), and a patch that takes care of it.
>
>
> Thanks, committed. The leak was not actually in the "main_data" handling, as
> quoted in the above snippet, but in the per-block buffers.

The patch pushed does not take care of the leak (tested on 3d5cb31c).
With the test script I sent upthread, I am still seeing it:
==32015== 154 bytes in 1 blocks are definitely lost in loss record 103 of 155
==32015== at 0x6B1B: malloc (in
/usr/local/Cellar/valgrind/3.10.1/lib/valgrind/vgpreload_memcheck-amd64-darwin.so)
==32015== by 0x10000DB2D: pg_malloc_internal (fe_memutils.c:30)
==32015== by 0x10000DE79: palloc (fe_memutils.c:118)
==32015== by 0x100005270: DecodeXLogRecord (xlogreader.c:1200)
==32015== by 0x100003A79: XLogReadRecord (xlogreader.c:458)
==32015== by 0x100002AF0: findLastCheckpoint (parsexlog.c:186)
==32015== by 0x1000017E1: main (pg_rewind.c:257)
==32015==
==32015== 154 bytes in 1 blocks are definitely lost in loss record 104 of 155
==32015== at 0x6B1B: malloc (in
/usr/local/Cellar/valgrind/3.10.1/lib/valgrind/vgpreload_memcheck-amd64-darwin.so)
==32015== by 0x10000DB2D: pg_malloc_internal (fe_memutils.c:30)
==32015== by 0x10000DE79: palloc (fe_memutils.c:118)
==32015== by 0x100005270: DecodeXLogRecord (xlogreader.c:1200)
==32015== by 0x100003A79: XLogReadRecord (xlogreader.c:458)
==32015== by 0x1000022E0: extractPageMap (parsexlog.c:78)
==32015== by 0x10000187C: main (pg_rewind.c:280)

The problem is that state->max_block_id gets reset to -1 each time a
new record is read via ResetDecoder. Hence if we decode a record that
has block data and then if a second record that does not contain block
data is decoded, we will leak if XLogReaderFree is called because no
attempts will be made to pfree what has been allocated. Per se the
patch attached.
Regards,
--
Michael

Attachment Content-Type Size
20150728_xlogreader_leak_v2.patch binary/octet-stream 510 bytes

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Sandeep Thakkar 2015-07-28 05:11:26 Re: installation error please help
Previous Message Tom Lane 2015-07-27 21:37:01 Re: BUG #13442: ISBN doesn't always roundtrip with text