Re: crash 11.5~ (and 11.4)

From: Justin Pryzby <pryzby(at)telsasoft(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: crash 11.5~ (and 11.4)
Date: 2019-08-08 01:24:36
Message-ID: 20190808012436.GG11185@telsasoft.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Aug 07, 2019 at 04:51:54PM -0700, Andres Freund wrote:
> Hi,
>
> On 2019-08-07 14:29:28 -0500, Justin Pryzby wrote:
> > Just found this, although I'm not sure what to do about it. If it's corrupt
> > table data, I can restore from backup.

In the meantime, I've renamed+uninherited the table and restored from backup,
with the broken table preserved.

However, the previous days' backup did this, so I used Monday's backup.

[pryzbyj(at)database ~]$ time sudo -u postgres pg_restore /srv/cdrperfbackup/ts/2019-08-06/curtables/ -t huawei_umts_ucell_201908 --verbose -d ts --host /tmp
pg_restore: connecting to database for restore
pg_restore: creating TABLE "child.huawei_umts_ucell_201908"
pg_restore: INFO: partition constraint for table "huawei_umts_ucell_201908" is implied by existing constraints
pg_restore: processing data for table "child.huawei_umts_ucell_201908"
pg_restore: [archiver (db)] Error while PROCESSING TOC:
pg_restore: [archiver (db)] Error from TOC entry 62710; 0 2015128626 TABLE DATA huawei_umts_ucell_201908 telsasoft
pg_restore: [archiver (db)] COPY failed for table "huawei_umts_ucell_201908": ERROR: invalid byte sequence for encoding "UTF8": 0xe7 0x28 0x9d
CONTEXT: COPY huawei_umts_ucell_201908, line 104746

Also, I found core with this BT from my own manual invocation of ANALYZE.

(gdb) bt
#0 0x00000039674324f5 in raise () from /lib64/libc.so.6
#1 0x0000003967433cd5 in abort () from /lib64/libc.so.6
#2 0x0000000000a3c4b2 in ExceptionalCondition (conditionName=0xaad9e7 "!(j > attnum)", errorType=0xaad84c "FailedAssertion", fileName=0xaad840 "heaptuple.c", lineNumber=582) at assert.c:54
#3 0x00000000004856a7 in nocachegetattr (tuple=0x1c116b08, attnum=1, tupleDesc=0x23332b0) at heaptuple.c:582
#4 0x000000000062981d in std_fetch_func (stats=0x22700c8, rownum=54991, isNull=0x7ffff5f3844f) at analyze.c:1718
#5 0x000000000062aa99 in compute_scalar_stats (stats=0x22700c8, fetchfunc=0x62942b <std_fetch_func>, samplerows=120000, totalrows=227970) at analyze.c:2370
#6 0x00000000006270db in do_analyze_rel (onerel=0x7fd37e2afa50, options=2, params=0x7ffff5f38a70, va_cols=0x0, acquirefunc=0x627e9e <acquire_sample_rows>, relpages=75494, inh=false,
in_outer_xact=false, elevel=13) at analyze.c:579
#7 0x0000000000626616 in analyze_rel (relid=2015128626, relation=0x2161820, options=2, params=0x7ffff5f38a70, va_cols=0x0, in_outer_xact=false, bstrategy=0x21f7848) at analyze.c:310
#8 0x00000000006c59b2 in vacuum (options=2, relations=0x21f79c0, params=0x7ffff5f38a70, bstrategy=0x21f7848, isTopLevel=true) at vacuum.c:357
#9 0x00000000006c54e0 in ExecVacuum (vacstmt=0x2161910, isTopLevel=true) at vacuum.c:141
#10 0x00000000008d1d7e in standard_ProcessUtility (pstmt=0x21619d0, queryString=0x2160d28 "ANALYZE child. huawei_umts_ucell_201908;", context=PROCESS_UTILITY_TOPLEVEL, params=0x0, queryEnv=0x0,
dest=0x2161cc0, completionTag=0x7ffff5f39100 "") at utility.c:670

#3 0x00000000004856a7 in nocachegetattr (tuple=0x1c116b08, attnum=1, tupleDesc=0x23332b0) at heaptuple.c:582
natts = 554
j = 1
tup = 0x1c116b20
tp = 0x1c116b85 "\374\023\026\213^s#\347(\235=\326\321\067\032\245\321B\026}܋FS\375\244\003\065\336\277;\252O\006\065\320\353\211}F\237\373B\243\357J~\270\"\230ƣ\024xǍ\334\377\202\277S\031\375\351\003\220{\004"
bp = 0x1c116b37 "\270\027$U}\232\246\235\004\255\331\033\006Qp\376E\316h\376\363\247\366Նgy7\311E\224~F\274\023ϋ%\216,\221\331(at)\024\363\233\070\275\004\254L\217t\262X\227\352\346\347\371\070\321ш\221\350fc\316\r\356\351h\275\213\230\360\203\374\023\026\213^s#\347(\235=\326\321\067\032\245\321B\026}܋FS\375\244\003\065\336\277;\252O\006\065\320\353\211}F\237\373B\243\357J~\270\"\230ƣ\024xǍ\334\377\202\277S\031\375\351\003\220{\004"
slow = false
off = -1
#4 0x000000000062981d in std_fetch_func (stats=0x22700c8, rownum=54991, isNull=0x7ffff5f3844f) at analyze.c:1718
attnum = 2
tuple = 0x1c116b08
tupDesc = 0x23332b0
#5 0x000000000062aa99 in compute_scalar_stats (stats=0x22700c8, fetchfunc=0x62942b <std_fetch_func>, samplerows=120000, totalrows=227970) at analyze.c:2370
value = 546120480
isnull = false
i = 54991
null_cnt = 0
nonnull_cnt = 54991
toowide_cnt = 0
---Type <return> to continue, or q <return> to quit---
total_width = 274955
is_varlena = true
is_varwidth = true
corr_xysum = 3.9525251667299724e-323
ssup = {ssup_cxt = 0x22ae080, ssup_collation = 100, ssup_reverse = false, ssup_nulls_first = false, ssup_attno = 0, ssup_extra = 0x22ae5b0, comparator = 0xa06720 <varstrfastcmp_locale>,
abbreviate = false, abbrev_converter = 0, abbrev_abort = 0, abbrev_full_comparator = 0}
values = 0x204f9970
values_cnt = 54991
tupnoLink = 0x206ce5c0
track = 0x22ae198
track_cnt = 0
num_mcv = 100
num_bins = 100
mystats = 0x2270490

> > ts=# VACUUM FREEZE VERBOSE child.huawei_umts_ucell_201908;
> > INFO: 00000: aggressively vacuuming "child.huawei_umts_ucell_201908"
> > LOCATION: lazy_scan_heap, vacuumlazy.c:502
> > ERROR: XX001: found xmin 73850277 from before relfrozenxid 111408920
> > LOCATION: heap_prepare_freeze_tuple, heapam.c:6853
>
> Ugh :(
>
> We really need to add a error context to vacuumlazy that shows which
> block is being processed.

> > I confirmed I updated to 11.4 immediately on its release:
> > [pryzbyj(at)database ~]$ ls -ltc /usr/pgsql-11/bin/postgres
> > -rwxr-xr-x. 1 root root 7291736 Jun 20 07:09 /usr/pgsql-11/bin/postgres
>
> Any chance you might not have restarted postgres at that time?

I don't think so, nagios would've been yelling at me. Also..I found I have a
log of pg_settings, which shows server_version updated 2019-06-20
07:05:01.425645. Is there a bug in 11.3 which could explain it ?

> > Here's usable looking bt
> >
> > #0 heap_prepare_freeze_tuple (tuple=0x7fd3d8fa0058, relfrozenxid=111408920,
> > relminmxid=1846, cutoff_xid=111658731, cutoff_multi=1846, frz=0x223e930,
> > totally_frozen_p=0x7ffff5f3848f) at heapam.c:6832
> > changed = 211
> > xmax_already_frozen = 127
> > xmin_frozen = false
> > freeze_xmax = false
> > xid = 0
> > __func__ = "heap_prepare_freeze_tuple"
>
> Hm. That's a backtrace to what precisely? Are you sure it's the the
> erroring call to heap_prepare_freeze_tuple? Because I think that's at
> the function's start - which is why some of the stack variables have
> bogus contents.

Yes, I had just done this: b heap_prepare_freeze_tuple.

> I think you'd need to set the breakpoint to heapam.c:6850 to be sure to
> catch the error (while the error message heapam.c:6853, that's because
> the line macro in some compilers expands to the end of the statement).

I did 6848 and got;

#0 heap_prepare_freeze_tuple (tuple=0x7fd403a02058, relfrozenxid=111408920, relminmxid=1846, cutoff_xid=111665057, cutoff_multi=1846, frz=0x22aa6f8, totally_frozen_p=0x7ffff5f3848f) at heapam.c:6849
changed = false
xmax_already_frozen = false
xmin_frozen = false
freeze_xmax = false
xid = 73850277
__func__ = "heap_prepare_freeze_tuple"

> > errfinish one seems to be not useful?
> >
> > (gdb) bt
> > #0 errfinish (dummy=0) at elog.c:415
> > #1 0x000000000054aefd in ShowTransactionStateRec (
> > str=0xafed12 "StartTransaction", s=0xf9efa0) at xact.c:5157
> > [...]
>
> That's probably just the errfinish for a debug message. Might be easier
> to set a breakpoint to pg_re_throw().

Right, that works.

(gdb) bt
#0 pg_re_throw () at elog.c:1730
#1 0x0000000000a3cb1e in errfinish (dummy=0) at elog.c:467
#2 0x00000000004e72c4 in heap_prepare_freeze_tuple (tuple=0x7fd403a02058, relfrozenxid=111408920, relminmxid=1846, cutoff_xid=111665065, cutoff_multi=1846, frz=0x2260558, totally_frozen_p=0x7ffff5f3848f) at heapam.c:6850
#3 0x00000000006c8e6e in lazy_scan_heap (onerel=0x7fd37d5f99c8, options=13, vacrelstats=0x22603b8, Irel=0x2260500, nindexes=3, aggressive=true) at vacuumlazy.c:1151

> > I looked at this briefly:
> >
> > ts=# SELECT * FROM page_header(get_raw_page('child.huawei_umts_ucell_201908', 0));
> > lsn | 3A49/66F44310
> > checksum | 0
> > flags | 4
> > lower | 36
> > upper | 1432
> > special | 8192
> > pagesize | 8192
> > version | 4
> > prune_xid | 0
>
> That's not necessarily the target block though, right? It'd be useful
> to get this for the block with the corruption, and perhaps one
> before/after. If the backtrace I commented on at the top is correct,
> the relevant tuple was at "blkno = 34915" (visible a frame or two
> up).

Ah, right.

ts=# SELECT * FROM page_header(get_raw_page('child.huawei_umts_ucell_201908', 34914));
lsn | checksum | flags | lower | upper | special | pagesize | version | prune_xid
---------------+----------+-------+-------+-------+---------+----------+---------+-----------
3A4A/4F2192B8 | 0 | 0 | 36 | 1520 | 8192 | 8192 | 4 | 0

ts=# SELECT * FROM page_header(get_raw_page('child.huawei_umts_ucell_201908', 34915));
3A4A/4F21C268 | 0 | 0 | 36 | 1520 | 8192 | 8192 | 4 | 0

ts=# SELECT * FROM page_header(get_raw_page('child.huawei_umts_ucell_201908', 34916));
3A4A/4F21DC90 | 0 | 0 | 36 | 1496 | 8192 | 8192 | 4 | 0

> Also heap_page_items() for those pages would be useful - you might want
> to not include the t_data column, as it has raw tuple data.

ts=# SELECT lp, lp_off, lp_flags, lp_len, t_xmin, t_xmax, t_field3, t_ctid, t_infomask2, t_infomask, t_hoff, t_bits, t_oid FROM heap_page_items(get_raw_page('child.huawei_umts_ucell_201908', 34914));
lp | lp_off | lp_flags | lp_len | t_xmin | t_xmax | t_field3 | t_ctid | t_infomask2 | t_infomask | t_hoff | t_bits ... | t_oid

1 | 5952 | 1 | 2236 | 111659903 | 0 | 0 | (34914,1) | 554 | 2311 | 96 | 11111111111111111111111111111111111111011100111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111000001111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111001111111111111111111111001111111111111111111111011111111111111111011111111101101111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111000000000001111111111111111111111111111111111111111111100111111000000 |
2 | 3736 | 1 | 2212 | 111659903 | 0 | 0 | (34914,2) | 554 | 2311 | 96 | 11111111111111111111111111111111111111011100111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111100000000011111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111001111111111111111111111000111111111111111111111011111111111111111010111111101101111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111000000000001111111111111111111111111111111111111111111100111111000000 |
3 | 1520 | 1 | 2216 | 111659903 | 0 | 0 | (34914,3) | 554 | 2311 | 96 | 11111111111111111111111111111111111111011100111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111100000001011111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111001111111111111111111111000111111111111111111111011111111111111111011111111101101111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111000000000001111111111111111111111111111111111111111111100111111000000 |

ts=# SELECT lp, lp_off, lp_flags, lp_len, t_xmin, t_xmax, t_field3, t_ctid, t_infomask2, t_infomask, t_hoff, t_bits, t_oid FROM heap_page_items(get_raw_page('child.huawei_umts_ucell_201908', 34915));
1 | 5976 | 1 | 2212 | 111659903 | 0 | 0 | (34915,1) | 554 | 2311 | 96 | 11111111111111111111111111111111111111011100111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111100000001011111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111001111111111111111111111000111111111111111111111011111111111111111010111111101101111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111000000000001111111111111111111111111111111111111111111100111111000000 |
2 | 3744 | 1 | 2232 | 111659903 | 0 | 0 | (34915,2) | 554 | 2311 | 96 | 11111111111111111111111111111111111111011100111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111100001111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111001111111111111111111111001111111111111111111111011111111111111111010111111101101111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111000000000001111111111111111111111111111111111111111111100111111000000 |
3 | 1520 | 1 | 2220 | 111659903 | 0 | 0 | (34915,3) | 554 | 2311 | 96 | 11111111111111111111111111111111111111011100111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111000001011111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111001111111111111111111111000111111111111111111111011111111111111111010111111101101111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111000000000001111111111111111111111111111111111111111111100111111000000 |

ts=# SELECT lp, lp_off, lp_flags, lp_len, t_xmin, t_xmax, t_field3, t_ctid, t_infomask2, t_infomask, t_hoff, t_bits, t_oid FROM heap_page_items(get_raw_page('child.huawei_umts_ucell_201908', 34916));
1 | 5952 | 1 | 2236 | 111659903 | 0 | 0 | (34916,1) | 554 | 2311 | 96 | 11111111111111111111111111111111111111011100111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111000001111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111001111111111111111111111001111111111111111111111011111111111111111011111111101101111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111000000000001111111111111111111111111111111111111111111100111111000000 |
2 | 3736 | 1 | 2212 | 111659903 | 0 | 0 | (34916,2) | 554 | 2311 | 96 | 11111111111111111111111111111111111111011100111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111100000001011111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111001111111111111111111111000111111111111111111111011111111111111111010111111101101111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111000000000001111111111111111111111111111111111111111111100111111000000 |
3 | 1496 | 1 | 2240 | 111659903 | 0 | 0 | (34916,3) | 554 | 2311 | 96 | 11111111111111111111111111111111111111011100111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111000001111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111001111111111111111111111001111111111111111111111011111111111111111010111111101101111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111001100000001111111111111111111111111111111111111111111100111111000000 |
> Could you show
> * pg_controldata

[pryzbyj(at)database ~]$ sudo -u postgres ./src/postgresql.bin/bin/pg_controldata -D /var/lib/pgsql/11/data
[sudo] password for pryzbyj:
pg_control version number: 1100
Catalog version number: 201809051
Database system identifier: 6616072377454496350
Database cluster state: in production
pg_control last modified: Wed 07 Aug 2019 06:32:05 PM MDT
Latest checkpoint location: 3A4C/D1BB5BD8
Latest checkpoint's REDO location: 3A4C/CCC1DF90
Latest checkpoint's REDO WAL file: 0000000100003A4C000000CC
Latest checkpoint's TimeLineID: 1
Latest checkpoint's PrevTimeLineID: 1
Latest checkpoint's full_page_writes: on
Latest checkpoint's NextXID: 0:111664949
Latest checkpoint's NextOID: 2069503852
Latest checkpoint's NextMultiXactId: 1846
Latest checkpoint's NextMultiOffset: 3891
Latest checkpoint's oldestXID: 33682470
Latest checkpoint's oldestXID's DB: 16564
Latest checkpoint's oldestActiveXID: 111664945
Latest checkpoint's oldestMultiXid: 1484
Latest checkpoint's oldestMulti's DB: 16564
Latest checkpoint's oldestCommitTsXid:0
Latest checkpoint's newestCommitTsXid:0
Time of latest checkpoint: Wed 07 Aug 2019 06:31:35 PM MDT
Fake LSN counter for unlogged rels: 0/1
Minimum recovery ending location: 0/0
Min recovery ending loc's timeline: 0
Backup start location: 0/0
Backup end location: 0/0
End-of-backup record required: no
wal_level setting: replica
wal_log_hints setting: off
max_connections setting: 200
max_worker_processes setting: 8
max_prepared_xacts setting: 0
max_locks_per_xact setting: 128
track_commit_timestamp setting: off
Maximum data alignment: 8
Database block size: 8192
Blocks per segment of large relation: 131072
WAL block size: 8192
Bytes per WAL segment: 16777216
Maximum length of identifiers: 64
Maximum columns in an index: 32
Maximum size of a TOAST chunk: 1996
Size of a large-object chunk: 2048
Date/time type storage: 64-bit integers
Float4 argument passing: by value
Float8 argument passing: by value
Data page checksum version: 0
Mock authentication nonce: ff863744b99b849579d561c5117157c78a8ced0563df824f3035f70810fec534

> * SELECT oid, datname, datfrozenxid, datminmxid FROM pg_database;
ts=# SELECT oid, datname, datfrozenxid, datminmxid FROM pg_database;
oid | datname | datfrozenxid | datminmxid
-------+-----------+--------------+------------
13529 | template0 | 103564559 | 1846
16400 | template1 | 52362404 | 1486
16443 | postgres | 46060229 | 1485
16564 | ts | 33682470 | 1484
(4 rows)

> That should be easy to redact if necessary. It'd be useful to know which
> database the corrupted table is in.
>
> It'd also be helpful to get something roughly like
> SELECT oid, oid::regclass, relkind, reltoastrelid, relfrozenxid, age(relfrozenxid), relminmxid, mxid_age(relminmxid), relpages, (SELECT txid_current()) FROM pg_class WHERE relfrozenxid <> 0;

> But I'm not sure if including the tablenames would be a problematic for
> your case. Possibly a bit hard to infer "usage" correlations without,
> but probably still worthwhile.

I'll send to you individually, I don't think it's secret, but it's 60k lines
long... Its OID verifies to me that the table was created on 8/1, and not 7/1,
which was possible. Week-old backup also verifies that.

Thanks,
Justin

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Craig Ringer 2019-08-08 02:40:29 Re: Global temporary tables
Previous Message Amit Langote 2019-08-08 01:10:11 Re: partition routing layering in nodeModifyTable.c