Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: vignesh C <vignesh21(at)gmail(dot)com>
Cc: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Alexey Kondratov <a(dot)kondratov(at)postgrespro(dot)ru>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions
Date: 2019-11-04 11:51:57
Message-ID: CAA4eK1KW8KzhnNiLk3ayKUA4CkVNb_fm8USqXDw0nUK_0togJg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Oct 30, 2019 at 9:38 AM vignesh C <vignesh21(at)gmail(dot)com> wrote:
>
> On Tue, Oct 22, 2019 at 10:52 PM Tomas Vondra
> <tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
> >
> > I think the patch should do the simplest thing possible, i.e. what it
> > does today. Otherwise we'll never get it committed.
> >
> I found a couple of crashes while reviewing and testing flushing of
> open transaction data:
>

Thanks for doing these tests. However, I don't think these issues are
anyway related to this patch. It seems to be base code issues
manifested by this patch. See my analysis below.

> Issue 1:
> #0 0x00007f22c5722337 in raise () from /lib64/libc.so.6
> #1 0x00007f22c5723a28 in abort () from /lib64/libc.so.6
> #2 0x0000000000ec5390 in ExceptionalCondition
> (conditionName=0x10ea814 "!dlist_is_empty(head)", errorType=0x10ea804
> "FailedAssertion",
> fileName=0x10ea7e0 "../../../../src/include/lib/ilist.h",
> lineNumber=458) at assert.c:54
> #3 0x0000000000b4fb91 in dlist_tail_element_off (head=0x19e4db8,
> off=64) at ../../../../src/include/lib/ilist.h:458
> #4 0x0000000000b546d0 in ReorderBufferAbortOld (rb=0x191b6b0,
> oldestRunningXid=3834) at reorderbuffer.c:1966
> #5 0x0000000000b3ca03 in DecodeStandbyOp (ctx=0x19af990,
> buf=0x7ffcbc26dc50) at decode.c:332
>

This seems to be the problem of base code where we abort immediately
after serializing the changes because in that case, the changes list
will be empty. I think you can try to reproduce it via the debugger
or by hacking the code such that it serializes after every change and
then if you abort after one change, it should hit this problem.

>
> Issue 2:
> #0 0x00007f1d7ddc4337 in raise () from /lib64/libc.so.6
> #1 0x00007f1d7ddc5a28 in abort () from /lib64/libc.so.6
> #2 0x0000000000ec4e1d in ExceptionalCondition
> (conditionName=0x10ead30 "txn->final_lsn != InvalidXLogRecPtr",
> errorType=0x10ea284 "FailedAssertion",
> fileName=0x10ea2d0 "reorderbuffer.c", lineNumber=3052) at assert.c:54
> #3 0x0000000000b577e0 in ReorderBufferRestoreCleanup (rb=0x2ae36b0,
> txn=0x2bafb08) at reorderbuffer.c:3052
> #4 0x0000000000b52b1c in ReorderBufferCleanupTXN (rb=0y x2ae36b0,
> txn=0x2bafb08) at reorderbuffer.c:1318
> #5 0x0000000000b5279d in ReorderBufferCleanupTXN (rb=0x2ae36b0,
> txn=0x2b9d778) at reorderbuffer.c:1257
> #6 0x0000000000b5475c in ReorderBufferAbortOld (rb=0x2ae36b0,
> oldestRunningXid=3835) at reorderbuffer.c:1973
>

This seems to be again the problem with base code as we don't update
the final_lsn for subtransactions during ReorderBufferAbortOld. This
can also be reproduced with some hacking in code or via debugger in a
similar way as explained for the previous problem but with a
difference that there must be subtransaction involved in this case.

> #7 0x0000000000b3ca03 in DecodeStandbyOp (ctx=0x2b676d0,
> buf=0x7ffcbc74cc00) at decode.c:332
> #8 0x0000000000b3c208 in LogicalDecodingProcessRecord (ctx=0x2b676d0,
> record=0x2b67990) at decode.c:121
> #9 0x0000000000b70b2b in XLogSendLogical () at walsender.c:2845
>
> These failures come randomly.
> I'm not able to reproduce this issue with simple test case.

Yeah, it appears to be difficult to reproduce unless you hack the code
to serialize every change or use debugger to forcefully flush the
changes every time.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Grigory Smolkin 2019-11-04 13:03:38 [proposal] recovery_target "latest"
Previous Message Jehan-Guillaume de Rorthais 2019-11-04 11:42:35 Re: pg_waldump erroneously outputs newline for FPWs, and another minor bug