Re: Failed to delete old ReorderBuffer spilled files

From: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To: torikoshi_atsushi_z2(at)lab(dot)ntt(dot)co(dot)jp
Cc: sawada(dot)mshk(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Failed to delete old ReorderBuffer spilled files
Date: 2017-11-22 00:03:54
Message-ID: 20171122.090354.14773778.horiguchi.kyotaro@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

At Tue, 21 Nov 2017 20:53:04 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote in <20171121(dot)205304(dot)90315453(dot)horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
> At Tue, 21 Nov 2017 20:27:25 +0900, atorikoshi <torikoshi_atsushi_z2(at)lab(dot)ntt(dot)co(dot)jp> wrote in <d5dddea9-4182-a7e4-f368-156f5470d244(at)lab(dot)ntt(dot)co(dot)jp>
> > Thanks for reviewing!
> >
> >
> > On 2017/11/21 18:12, Masahiko Sawada wrote:
> > > On Tue, Nov 21, 2017 at 3:48 PM, Masahiko Sawada
> > > <sawada(dot)mshk(at)gmail(dot)com> wrote:
> > >> On Mon, Nov 20, 2017 at 7:35 PM, atorikoshi
> > >> <torikoshi_atsushi_z2(at)lab(dot)ntt(dot)co(dot)jp> wrote:
> > >>> Hi,
> > >>>
> > >>> I put many queries into one transaction and made ReorderBuffer spill
> > >>> data to disk, and sent SIGKILL to postgres before the end of the
> > >>> transaction.
> > >>>
> > >>> After starting up postgres again, I observed the files spilled to
> > >>> data wasn't deleted.
> > >>>
> > >>> I think these files should be deleted because its transaction was no
> > >>> more valid, so no one can use these files.
> > >>>
> > >>>
> > >>> Below is a reproduction instructions.
> > >>>
> > >>> ------------------------------------------------
> > >>> 1. Create table and publication at publiser.
> > >>>
> > >>> @pub =# CREATE TABLE t1(
> > >>> id INT PRIMARY KEY,
> > >>> name TEXT);
> > >>>
> > >>> @pub =# CREATE PUBLICATION pub FOR TABLE t1;
> > >>>
> > >>> 2. Create table and subscription at subscriber.
> > >>>
> > >>> @sub =# CREATE TABLE t1(
> > >>> id INT PRIMARY KEY,
> > >>> name TEXT
> > >>> );
> > >>>
> > >>> @sub =# CREATE SUBSCRIPTION sub
> > >>> CONNECTION 'host=[hostname] port=[port] dbname=[dbname]'
> > >>> PUBLICATION pub;
> > >>>
> > >>> 3. Put many queries into one transaction.
> > >>>
> > >>> @pub =# BEGIN;
> > >>> INSERT INTO t1
> > >>> SELECT
> > >>> i,
> > >>> 'aaaaaaaaaa'
> > >>> FROM
> > >>> generate_series(1, 1000000) as i;
> > >>>
> > >>> 4. Then we can see spilled files.
> > >>>
> > >>> @pub $ ls -1 ${PGDATA}/pg_replslot/sub/
> > >>> state
> > >>> xid-561-lsn-0-1000000.snap
> > >>> xid-561-lsn-0-2000000.snap
> > >>> xid-561-lsn-0-3000000.snap
> > >>> xid-561-lsn-0-4000000.snap
> > >>> xid-561-lsn-0-5000000.snap
> > >>> xid-561-lsn-0-6000000.snap
> > >>> xid-561-lsn-0-7000000.snap
> > >>> xid-561-lsn-0-8000000.snap
> > >>> xid-561-lsn-0-9000000.snap
> > >>>
> > >>> 5. Kill publisher's postgres process before COMMIT.
> > >>>
> > >>> @pub $ kill -s SIGKILL [pid of postgres]
> > >>>
> > >>> 6. Start publisher's postgres process.
> > >>>
> > >>> @pub $ pg_ctl start -D ${PGDATA}
> > >>>
> > >>> 7. After a while, we can see the files remaining.
> > >>> (Immediately after starting publiser, we can not see these files.)
> > >>>
> > >>> @pub $ pg_ctl start -D ${PGDATA}
> > >>>
> > >>> When I configured with '--enable-cassert', below assertion error
> > >>> was appeared.
> > >>>
> > >>> TRAP: FailedAssertion("!(txn->final_lsn != 0)", File:
> > >>> "reorderbuffer.c",
> > >>> Line: 2576)
> > >>> ------------------------------------------------
> > >>>
> > >>> Attached patch sets final_lsn to the last ReorderBufferChange if
> > >>> final_lsn == 0.
> > >>
> > >> Thank you for the report. I could reproduce this issue with the above
> > >> step. My analysis is, the cause of that a serialized reorder buffer
> > >> isn't cleaned up is that the aborted transaction without an abort WAL
> > >> record has no chance to set ReorderBufferTXN->final_lsn. So if there
> > >> is such serialized transactions ReorderBufferRestoreCleanup cleanups
> > >> no files, which is cause of the assertion failure (or a file being
> > >> orphaned). What do you think?
> > >>
> > >> On detail of your patch, I'm not sure it's safe if we set the lsn of
> > >> other than commit record or abort record to final_lsn. The comment in
> > >> reorderbuffer.h says,
> > >>
> > >> typedef trcut ReorderBufferTXN
> > >> {
> > >> (snip)
> > >>
> > >> /* ----
> > >> * LSN of the record that lead to this xact to be committed or
> > >> * aborted. This can be a
> > >> * * plain commit record
> > >> * * plain commit record, of a parent transaction
> > >> * * prepared transaction commit
> > >> * * plain abort record
> > >> * * prepared transaction abort
> > >> * * error during decoding
> > >> * ----
> > >> */
> > >> XLogRecPtr final_lsn;
> > >>
> > >> But with your patch, we could set a lsn of a record that is other than
> > >> what listed above to final_lsn. One way I came up with is to make
> >
> > I added some comments on final_lsn.
> >
> > >> ReorderBufferRestoreCleanup accept an invalid value of final_lsn and
> > >> regards it as a aborted transaction that doesn't has a abort WAL
> > >> record. So we can cleanup all serialized files if final_lsn of a
> > >> transaction is invalid.
> > >
> > > After more thought, since there are some codes cosmetically setting
> > > final_lsn when the fate of transaction is determined possibly we
> > > should not accept a invalid value of final_lsn even in the case.
> > >
>
> It doesn't seem just a cosmetic. The first/last_lsn are used to
> determin the files to be deleted. On the other hand the TXN
> cannot have last_lsn since it hasn't see abort/commit record.
>
> > My new patch keeps setting final_lsn, but changed its location to the
> > top of ReorderBufferCleanupTXN().
> > I think it's a kind of preparation, so doing it at the top improves
> > readability.
> >
> > > Anyway I think you should register this patch to the next commit fest
> > > so as not forget.
> >
> > Thanks for you advice, I've registered this issue as a bug.
>
> Using last changing LSN might work but I'm afraid that that fails
> to remove the last snap file if the crash happens at the very
> start of a segment.
>
> Anyway all files of the transaction is no longer useless at the
> time, but it seems that the last_lsn is required to avoid
> directory scanning at every transaction end.
>
> Letting ReorderBufferAbortOld scan the directory and determine
> the first and last LSN then set to the txn would work but it
> might be an overkill. Using the beginning LSN of the next segment
> of the last_change->lsn could surely work... really?
> (ReorderBufferRestoreCleanup doesn't complain on ENOENT.)

Somehow I deleted exessively while editing. One more possible
solution is making ReorderBufferAbortOld take final LSN and
DecodeStandbyOp passes the LSN of XLOG_RUNNING_XACTS record to
it.

> By the way, just using unlink() there might lead to the revmoed
> file's resurrection but it would be another issue.

--
Kyotaro Horiguchi
NTT Open Source Software Center

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2017-11-22 00:21:03 Re: Anybody care about having the verbose form of the tzdata files?
Previous Message Thomas Munro 2017-11-21 23:38:38 Re: [HACKERS] Avoiding OOM in a hash join with many duplicate inner keys