Re: Failed to delete old ReorderBuffer spilled files

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: atorikoshi <torikoshi_atsushi_z2(at)lab(dot)ntt(dot)co(dot)jp>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Failed to delete old ReorderBuffer spilled files
Date: 2017-11-21 09:12:29
Message-ID: CAD21AoA+5X2W9oTihXhXj=XnizOFSyDVTTEytDN-9XskaoSbkA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Nov 21, 2017 at 3:48 PM, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> On Mon, Nov 20, 2017 at 7:35 PM, atorikoshi
> <torikoshi_atsushi_z2(at)lab(dot)ntt(dot)co(dot)jp> wrote:
>> Hi,
>>
>> I put many queries into one transaction and made ReorderBuffer spill
>> data to disk, and sent SIGKILL to postgres before the end of the
>> transaction.
>>
>> After starting up postgres again, I observed the files spilled to
>> data wasn't deleted.
>>
>> I think these files should be deleted because its transaction was no
>> more valid, so no one can use these files.
>>
>>
>> Below is a reproduction instructions.
>>
>> ------------------------------------------------
>> 1. Create table and publication at publiser.
>>
>> @pub =# CREATE TABLE t1(
>> id INT PRIMARY KEY,
>> name TEXT);
>>
>> @pub =# CREATE PUBLICATION pub FOR TABLE t1;
>>
>> 2. Create table and subscription at subscriber.
>>
>> @sub =# CREATE TABLE t1(
>> id INT PRIMARY KEY,
>> name TEXT
>> );
>>
>> @sub =# CREATE SUBSCRIPTION sub
>> CONNECTION 'host=[hostname] port=[port] dbname=[dbname]'
>> PUBLICATION pub;
>>
>> 3. Put many queries into one transaction.
>>
>> @pub =# BEGIN;
>> INSERT INTO t1
>> SELECT
>> i,
>> 'aaaaaaaaaa'
>> FROM
>> generate_series(1, 1000000) as i;
>>
>> 4. Then we can see spilled files.
>>
>> @pub $ ls -1 ${PGDATA}/pg_replslot/sub/
>> state
>> xid-561-lsn-0-1000000.snap
>> xid-561-lsn-0-2000000.snap
>> xid-561-lsn-0-3000000.snap
>> xid-561-lsn-0-4000000.snap
>> xid-561-lsn-0-5000000.snap
>> xid-561-lsn-0-6000000.snap
>> xid-561-lsn-0-7000000.snap
>> xid-561-lsn-0-8000000.snap
>> xid-561-lsn-0-9000000.snap
>>
>> 5. Kill publisher's postgres process before COMMIT.
>>
>> @pub $ kill -s SIGKILL [pid of postgres]
>>
>> 6. Start publisher's postgres process.
>>
>> @pub $ pg_ctl start -D ${PGDATA}
>>
>> 7. After a while, we can see the files remaining.
>> (Immediately after starting publiser, we can not see these files.)
>>
>> @pub $ pg_ctl start -D ${PGDATA}
>>
>> When I configured with '--enable-cassert', below assertion error
>> was appeared.
>>
>> TRAP: FailedAssertion("!(txn->final_lsn != 0)", File: "reorderbuffer.c",
>> Line: 2576)
>> ------------------------------------------------
>>
>> Attached patch sets final_lsn to the last ReorderBufferChange if
>> final_lsn == 0.
>
> Thank you for the report. I could reproduce this issue with the above
> step. My analysis is, the cause of that a serialized reorder buffer
> isn't cleaned up is that the aborted transaction without an abort WAL
> record has no chance to set ReorderBufferTXN->final_lsn. So if there
> is such serialized transactions ReorderBufferRestoreCleanup cleanups
> no files, which is cause of the assertion failure (or a file being
> orphaned). What do you think?
>
> On detail of your patch, I'm not sure it's safe if we set the lsn of
> other than commit record or abort record to final_lsn. The comment in
> reorderbuffer.h says,
>
> typedef trcut ReorderBufferTXN
> {
> (snip)
>
> /* ----
> * LSN of the record that lead to this xact to be committed or
> * aborted. This can be a
> * * plain commit record
> * * plain commit record, of a parent transaction
> * * prepared transaction commit
> * * plain abort record
> * * prepared transaction abort
> * * error during decoding
> * ----
> */
> XLogRecPtr final_lsn;
>
> But with your patch, we could set a lsn of a record that is other than
> what listed above to final_lsn. One way I came up with is to make
> ReorderBufferRestoreCleanup accept an invalid value of final_lsn and
> regards it as a aborted transaction that doesn't has a abort WAL
> record. So we can cleanup all serialized files if final_lsn of a
> transaction is invalid.

After more thought, since there are some codes cosmetically setting
final_lsn when the fate of transaction is determined possibly we
should not accept a invalid value of final_lsn even in the case.

> Since I'm not very familiar with snapshot
> building part please check it.

Sorry I wanted to say the reorderbuffer module. The snapshot building
isn't relevant.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Antonin Houska 2017-11-21 09:14:11 Re: [HACKERS] CLUSTER command progress monitor
Previous Message Amit Khandekar 2017-11-21 08:52:57 Re: [HACKERS] Parallel Append implementation