Re: crash 11.5~ (and 11.4)

From: Andres Freund <andres(at)anarazel(dot)de>
To: Justin Pryzby <pryzby(at)telsasoft(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: crash 11.5~ (and 11.4)
Date: 2019-08-07 23:51:54
Message-ID: 20190807235154.erbmr4o4bo6vgnjv@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2019-08-07 14:29:28 -0500, Justin Pryzby wrote:
> Just found this, although I'm not sure what to do about it. If it's corrupt
> table data, I can restore from backup.
>
> ts=# VACUUM FREEZE VERBOSE child.huawei_umts_ucell_201908;
> INFO: 00000: aggressively vacuuming "child.huawei_umts_ucell_201908"
> LOCATION: lazy_scan_heap, vacuumlazy.c:502
> ERROR: XX001: found xmin 73850277 from before relfrozenxid 111408920
> LOCATION: heap_prepare_freeze_tuple, heapam.c:6853

Ugh :(

We really need to add a error context to vacuumlazy that shows which
block is being processed.

> I confirmed I updated to 11.4 immediately on its release:
> [pryzbyj(at)database ~]$ ls -ltc /usr/pgsql-11/bin/postgres
> -rwxr-xr-x. 1 root root 7291736 Jun 20 07:09 /usr/pgsql-11/bin/postgres

Any chance you might not have restarted postgres at that time?

> That table would've been created shortly after midnight on Aug 1 when we loaded
> first data for the month. So it was created and processed only on pg11.4,
> although the parent has probably been around since pg_upgrade last October.

I don't think there's a way the parent could play a role.

> Here's usable looking bt
>
> #0 heap_prepare_freeze_tuple (tuple=0x7fd3d8fa0058, relfrozenxid=111408920,
> relminmxid=1846, cutoff_xid=111658731, cutoff_multi=1846, frz=0x223e930,
> totally_frozen_p=0x7ffff5f3848f) at heapam.c:6832
> changed = 211
> xmax_already_frozen = 127
> xmin_frozen = false
> freeze_xmax = false
> xid = 0
> __func__ = "heap_prepare_freeze_tuple"

Hm. That's a backtrace to what precisely? Are you sure it's the the
erroring call to heap_prepare_freeze_tuple? Because I think that's at
the function's start - which is why some of the stack variables have
bogus contents.

I think you'd need to set the breakpoint to heapam.c:6850 to be sure to
catch the error (while the error message heapam.c:6853, that's because
the line macro in some compilers expands to the end of the statement).

> errfinish one seems to be not useful?
>
> (gdb) bt
> #0 errfinish (dummy=0) at elog.c:415
> #1 0x000000000054aefd in ShowTransactionStateRec (
> str=0xafed12 "StartTransaction", s=0xf9efa0) at xact.c:5157
> #2 0x000000000054ad5c in ShowTransactionState (
> str=0xafed12 "StartTransaction") at xact.c:5130
> #3 0x00000000005470c8 in StartTransaction () at xact.c:1961
> #4 0x0000000000547db9 in StartTransactionCommand () at xact.c:2734
> #5 0x00000000008cbe65 in start_xact_command () at postgres.c:2500
> #6 0x00000000008c9772 in exec_simple_query (
> query_string=0x2160d28 "VACUUM FREEZE VERBOSE child.huawei_umts_ucell_201908;") at postgres.c:948
> #7 0x00000000008cddf3 in PostgresMain (argc=1, argv=0x218c768,
> dbname=0x218c5d0 "ts", username=0x218c5b0 "pryzbyj") at postgres.c:4182
> #8 0x000000000082a098 in BackendRun (port=0x21818d0) at postmaster.c:4358
> #9 0x0000000000829806 in BackendStartup (port=0x21818d0) at postmaster.c:4030
> #10 0x0000000000825cab in ServerLoop () at postmaster.c:1707
> #11 0x00000000008255dd in PostmasterMain (argc=3, argv=0x215b9a0)
> at postmaster.c:1380
> #12 0x000000000074ba30 in main (argc=3, argv=0x215b9a0) at main.c:228

That's probably just the errfinish for a debug message. Might be easier
to set a breakpoint to pg_re_throw().

> I looked at this briefly:
>
> ts=# SELECT * FROM page_header(get_raw_page('child.huawei_umts_ucell_201908', 0));
> lsn | 3A49/66F44310
> checksum | 0
> flags | 4
> lower | 36
> upper | 1432
> special | 8192
> pagesize | 8192
> version | 4
> prune_xid | 0

That's not necessarily the target block though, right? It'd be useful
to get this for the block with the corruption, and perhaps one
before/after. If the backtrace I commented on at the top is correct,
the relevant tuple was at "blkno = 34915" (visible a frame or two
up).

Also heap_page_items() for those pages would be useful - you might want
to not include the t_data column, as it has raw tuple data.

Could you show
* pg_controldata
* SELECT oid, datname, datfrozenxid, datminmxid FROM pg_database;

That should be easy to redact if necessary. It'd be useful to know which
database the corrupted table is in.

It'd also be helpful to get something roughly like
SELECT
oid, oid::regclass,
relkind, reltoastrelid,
relfrozenxid, age(relfrozenxid),
relminmxid, mxid_age(relminmxid),
relpages,
(SELECT txid_current())
FROM pg_class
WHERE relfrozenxid <> 0;

But I'm not sure if including the tablenames would be a problematic for
your case. Possibly a bit hard to infer "usage" correlations without,
but probably still worthwhile.

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alexander Korotkov 2019-08-08 00:05:08 Re: SQL/JSON path: collation for comparisons, minor typos in docs
Previous Message Sehrope Sarkuni 2019-08-07 23:40:05 Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)