Re: Failed to delete old ReorderBuffer spilled files

From: atorikoshi <torikoshi_atsushi_z2(at)lab(dot)ntt(dot)co(dot)jp>
To: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Craig Ringer <craig(at)2ndquadrant(dot)com>, thomas(dot)munro(at)enterprisedb(dot)com
Cc: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>
Subject: Re: Failed to delete old ReorderBuffer spilled files
Date: 2017-11-24 01:20:23
Message-ID: b64e5436-67ee-c72e-6658-c8bf6a43934e@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2017/11/22 16:49, Masahiko Sawada wrote:
> On Wed, Nov 22, 2017 at 2:56 PM, Craig Ringer <craig(at)2ndquadrant(dot)com> wrote:
>> On 22 November 2017 at 12:15, Kyotaro HORIGUCHI
>> <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
>>>
>>> At Wed, 22 Nov 2017 12:57:34 +0900, Michael Paquier
>>> <michael(dot)paquier(at)gmail(dot)com> wrote in
>>> <CAB7nPqQP52cLEUZJv-1MoCiejNYQ4CGs=tzwhP2oEErvv7R3Bg(at)mail(dot)gmail(dot)com>
>>>> On Wed, Nov 22, 2017 at 11:49 AM, Craig Ringer <craig(at)2ndquadrant(dot)com>
>>>> wrote:
>>>>> On 20 November 2017 at 18:35, atorikoshi
>>>>> <torikoshi_atsushi_z2(at)lab(dot)ntt(dot)co(dot)jp> wrote:
>>>>>> I put many queries into one transaction and made ReorderBuffer spill
>>>>>> data to disk, and sent SIGKILL to postgres before the end of the
>>>>>> transaction.
>>>>>>
>>>>>> After starting up postgres again, I observed the files spilled to
>>>>>> data wasn't deleted.
>>>>>
>>>>> Since this can only happen on crash exits, and the reorderbuffer data
>>>>> is
>>>>> useless after a decoding backend exits, why don't we just recursively
>>>>> delete
>>>>> the tree contents on Pg startup?
>>>>
>>>> +1. You would just need an extra step after say
>>>> DeleteAllExportedSnapshotFiles() in startup.c. Looks saner to me do so
>>>> so as well.
>>>
>>> The old files are being removed at startup by
>>> StartupReorderBuffer.
>>
>>
>> That seems to contradict the statement above, that "after starting up
>> postgres again, I observed the files spilled to disk weren't deleted".
>>
>>>
>>> At the time of assertion failure, the files are not of the
>>> previous run, but they are created after reconnection from the
>>> subscriber.
>>
>>
>> Are you saying the problem files do not exist when we start up, but are
>> created and then leaked after logical decoding resumes after an unclean
>> startup?
>>
>> ... Yes, that does appear to be the case, per the original report:
>>
>> "7. After a while, we can see the files remaining.
>> (Immediately after starting publiser, we can not see these files.)"
>>
>> I was confused by "remaining". They're not remaining, they've been
>> re-created.

Sorry for my incorrect description, I should have said re-created.

>>
>> But if they're re-created, why are they not recreated correctly after an
>> unclean shutdown? What persistent state is causing that? We should be
>> clobbering saved reorder buffer temp files, snapshots, etc at startup. The
>> slot state is pretty simple, it'd just be a bit behind.
>>
>> The key difference seems to be that we hard-kill the server so it can't
>> write anything to clog. The xact is implicitly aborted, we never wrote any
>> xlog record for a commit or abort. The problem is presumably with decoding
>> xacts that were implicitly aborted by server crash, after we restart the
>> server and resume decoding.
>>
>> The assertion failure reported is in ReorderBufferRestoreCleanup, which
>> makes sense.
>>
>> Because there's no xlog record of the crash, we never set the buffer's
>> final_lsn in ReorderBufferCommit or ReorderBufferAbort .
>
> Yeah, that's the same as my analysis.
>
>> Note the comment there:
>>
>> * NB: Transactions handled here have to have actively aborted (i.e. have
>> * produced an abort record). Implicitly aborted transactions are handled
>> via
>> * ReorderBufferAbortOld(); transactions we're just not interested in, but
>> * which have committed are handled in ReorderBufferForget().
>>
>> That's only called from DecodeStandbyOp in response to an xl_running_xacts.
>>
>> Here's the backtrace.
>>
>> Core was generated by `postgres: wal sender process postgres [local'.
>> Program terminated with signal SIGABRT, Aborted.
>> ...
>> #2 0x00000000008537b7 in ExceptionalCondition
>> (conditionName=conditionName(at)entry=0x9fcdf6 "!(txn->final_lsn != 0)",
>> errorType=errorType(at)entry=0x89bcb4 "FailedAssertion",
>> fileName=fileName(at)entry=0x9fcd04 "reorderbuffer.c",
>> lineNumber=lineNumber(at)entry=2576) at assert.c:54
>> #3 0x00000000006fec02 in ReorderBufferRestoreCleanup
>> (rb=rb(at)entry=0x1b4c370, txn=txn(at)entry=0x1b5c3b8) at reorderbuffer.c:2576
>> #4 0x0000000000700693 in ReorderBufferCleanupTXN (rb=rb(at)entry=0x1b4c370,
>> txn=txn(at)entry=0x1b5c3b8) at reorderbuffer.c:1145
>> #5 0x0000000000701516 in ReorderBufferAbortOld (rb=0x1b4c370,
>> oldestRunningXid=558) at reorderbuffer.c:1730
>> #6 0x00000000006f5a47 in DecodeStandbyOp (ctx=0x1af9ce0,
>> buf=buf(at)entry=0x7ffd11761200) at decode.c:325
>> #7 0x00000000006f65bf in LogicalDecodingProcessRecord (ctx=<optimized out>,
>> record=<optimized out>) at decode.c:117
>> #8 0x00000000007098ab in XLogSendLogical () at walsender.c:2766
>> #9 0x000000000070a875 in WalSndLoop (send_data=send_data(at)entry=0x709857
>> <XLogSendLogical>) at walsender.c:2134
>> #10 0x000000000070b011 in StartLogicalReplication (cmd=cmd(at)entry=0x1a9cd68)
>> at walsender.c:1101
>> #11 0x000000000070b46f in exec_replication_command
>> (cmd_string=cmd_string(at)entry=0x1afec30 "START_REPLICATION SLOT \"sub\"
>> LOGICAL 0/0 (proto_version '1', publication_names '\"pub\"')") at
>> walsender.c:1527
>> #12 0x0000000000758809 in PostgresMain (argc=<optimized out>,
>> argv=argv(at)entry=0x1aab870, dbname=<optimized out>, username=<optimized out>)
>> at postgres.c:4086
>> #13 0x00000000006e178d in BackendRun (port=port(at)entry=0x1aa3430) at
>> postmaster.c:4357
>> #14 0x00000000006e35e9 in BackendStartup (port=port(at)entry=0x1aa3430) at
>> postmaster.c:4029
>> #15 0x00000000006e39e3 in ServerLoop () at postmaster.c:1753
>> #16 0x00000000006e4b36 in PostmasterMain (argc=argc(at)entry=3,
>> argv=argv(at)entry=0x1a7c5d0) at postmaster.c:1361
>> #17 0x000000000065d093 in main (argc=3, argv=0x1a7c5d0) at main.c:228
>>
>> So it's clear why we can call ReorderBufferCleanupOld with no final_lsn set.
>>
>> I think your approach of setting final_lsn to the last buffered action is
>> fine.
>>
>> I wonder if we should be setting it in ReorderBufferAbortOld though, when we
>> know for sure that we're seeing final_lsn==0 in response to this specific
>> situation. That way we don't mask other potential bugs the assertions are
>> there to catch.
>>
>
> +1

+1, I've changed the location.

>
> + /*
> + * Before cleaning up, do a preparation.
> + * If this transaction encountered crash during transaction,
> + * txn->final_lsn remains initial value.
> + * To properly remove entries which were spilled to disk, we need valid
> + * final_lsn.
> + * So set final_lsn to the lsn of last ReorderBufferChange.
> + */
>
> ISTM the comment is slightly unclear. How about something like "The
> final_lsn of which transaction that haven't produced an abort record
> have is invalid. To ensure cleanup the serialized changes of such
> transaction we set the LSN of the last change action to final_lsn."?
>
> -------
> @@ -168,6 +168,8 @@ typedef struct ReorderBufferTXN
> * * plain abort record
> * * prepared transaction abort
> * * error during decoding
> + * Only when this xact encountered server crash, this value is set to
> + * the lsn of last ReorderBufferChange for cleaning up spilled files.
> * ----
> */
> XLogRecPtr final_lsn;
>
> Also, how about something like "Note that this can also be a LSN of
> the last change action of this xact if it is an implicitely aborted
> transaction."? Since I'm not a native speaker it sounds might be wrong
> though.
>

+1, I've changed comments using your proposal.

> Hi Torikoshi-san,
>
> FYI "make check" in contrib/test_decoding fails a couple of isolation
> tests, one with an assertion failure for my automatic patch tester[1].
> Same result on my laptop:
>
> test ondisk_startup ... FAILED (test process exited with exit code 1)
> test concurrent_ddl_dml ... FAILED (test process exited with exit code 1)
>
> TRAP: FailedAssertion("!(!dlist_is_empty(head))", File:
> "../../../../src/include/lib/ilist.h", Line: 458)

Thanks for letting me know.
I think I only tested running "make check" at top directory, sorry
for my insufficient test.

The test failure happened at the beginning of replication(creating
slot), so there are no changes yet and getting the tail of changes
failed.

Because the bug we are fixing only happens after creating files,
I've added "txn->serialized" to the if statement condition.

Regards,

--
Atsushi Torikoshi
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Attachment Content-Type Size
set_final_lsn_3.patch text/plain 1.5 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2017-11-24 01:20:32 Re: Combine function returning NULL unhandled?
Previous Message Tsunakawa, Takayuki 2017-11-24 00:28:29 RE: [HACKERS] Supporting huge pages on Windows