Re: Postgres crashes at memcopy() after upgrade to PG 13.

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Avinash Kumar <avinash(dot)vallarapu(at)gmail(dot)com>
Cc: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Postgres crashes at memcopy() after upgrade to PG 13.
Date: 2021-03-16 17:02:46
Message-ID: CAH2-WzmCRcAO=+MQ7ujabcO0-45r4qTgrZ9M4S8TGWKtZ84trw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-hackers

On Tue, Mar 16, 2021 at 9:50 AM Avinash Kumar
<avinash(dot)vallarapu(at)gmail(dot)com> wrote:
> Yes, it was on the failover-over server where the issue is currently seen. Took a snapshot of the data directory so that the issue can be analyzed.

I would be very cautious when using LVM snapshots with a Postgres data
directory, or VM-based snapshotting tools. There are many things that
can go wrong with these tools, which are usually not sensitive to the
very specific requirements of a database system like Postgres (e.g.
inconsistencies between WAL and data files can emerge in many
scenarios).

My general recommendation is to avoid these tools completely --
consistently use a backup solution like pgBackrest instead.

BTW, running pg_repack is something that creates additional risk of
database corruption, at least to some degree. That seems less likely
to have been the problem here (I think that it's probably something
with snapshots). Something to consider.

> I can do this. But, to add here, when we do a pg_repack or rebuild of Indexes, automatically this is resolved.

Your bug report was useful to me, because it made me realize that the
posting list split code in _bt_swap_posting() is unnecessarily
trusting of the on-disk data -- especially compared to _bt_split(),
the page split code. While I consider it unlikely that the problem
that you see is truly a bug in Postgres, it is still true that the
crash that you saw should probably have just been an error.

We don't promise that the database cannot crash even with corrupt
data, but we do try to avoid it whenever possible. I may be able to
harden _bt_swap_posting(), to make failures like this a little more
friendly. It's an infrequently hit code path, so we can easily afford
to make the code more careful/less trusting.

--
Peter Geoghegan

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Tom Lane 2021-03-16 17:39:54 Re: WAL-files is not removing authomaticaly
Previous Message Avinash Kumar 2021-03-16 16:50:10 Re: Postgres crashes at memcopy() after upgrade to PG 13.

Browse pgsql-hackers by date

  From Date Subject
Next Message Surafel Temesgen 2021-03-16 17:30:51 Re: Calendar support in localization
Previous Message Robert Haas 2021-03-16 16:56:40 Re: pg_amcheck contrib application