Re: [Incident report]Backend process crashed when executing 2pc transaction

From: Michael Paquier <michael(at)paquier(dot)xyz>
To: Amit Langote <amitlangote09(at)gmail(dot)com>
Cc: LIANGBO <liangboa(at)suning(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: [Incident report]Backend process crashed when executing 2pc transaction
Date: 2019-11-28 04:47:28
Message-ID: 20191128044728.GV237562@paquier.xyz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Nov 28, 2019 at 01:24:00PM +0900, Amit Langote wrote:
> Have you considered *also* reporting this to Citus developers, because
> while the crash seems to have occurred in the core PostgreSQL code
> they may have a better chance reproducing this if at all.

Hard to fully conclude with the information at hand. Still, if you
look at the backtrace, it complains about readRecordBuf being already
free'd, which is something that happens only if it is not NULL and
only when freeing the reader. The thing is that this area is used
only as a temporary buffer for a record being read, which may
optionally get extended. Please note as well that the stack trace
mentions multi_ProcessUtility(), which is not Postgres code. So my
gut actually tells me that this is a Citus-only bug, and that there is
an issue with some memory context cleanup in a xact callback or such.
Just a guess, but this could explain why the memory area of
readRecordBuf just went magically away.

If you can produce a test case with just Postgres, that's another
story of course, and if it were a bug in Postgres, I would imagine
that a simple pgbench test running a lot of 2PC transactions in
parallel may be able to reproduce it after some time.
--
Michael

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2019-11-28 04:50:41 Re: Collation versioning
Previous Message Amit Langote 2019-11-28 04:24:00 Re: [Incident report]Backend process crashed when executing 2pc transaction