Re: ERROR: MultiXactId xxxx has not been created yet -- apparent wraparound

From: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To: Paul Smith <paul(at)pscs(dot)co(dot)uk>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: ERROR: MultiXactId xxxx has not been created yet -- apparent wraparound
Date: 2015-05-26 21:41:09
Message-ID: 20150526214109.GZ5885@postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Paul Smith wrote:

> In the backup, for the row which will become broken, I get:
>
> ctid | xmin | xmax
> ------------+----------+----------
> (18722,29) | 23862661 | 23862661

Okay.

> Assuming it hadn't changed in the broken version (from application
> knowledge, it's unlikely to change), then on the broken version,
> select heap_page_items(get_raw_page('messages.msgdata', 18722)); gives:
>
> (21,3112,1,812,1332175474,894779760,223823413,"(1443526721,25142)",12141,18516,57,,)
> (23,4448,1,352,728903474,1466642284,1869042033,"(1262568257,16743)",16751,26455,65,111101101000001011010010100000101010001011110110100000101101001010000010100000000000000000000000000000001010000001110110110000001000000001001000101000010110111000000000000000001100000111100010000000000000000010110010101100101101100000000000110010110010110011000000000000000000000000000000000000000000000000000000000000000000000010100001,)
> (24,4184,1,264,619804,228,8000,"(956969542,26694)",22857,29251,98,110001101100001001100110011001100001110010000010001000100000111011110100110100101001011000101100100101100110001001000010101011100101111001010110111100101100001011000110101000101101011000011100010101100100110010011100110100101001101001100110110001101000101000110110100101100100001010011110000010100011011001000010101011001111001010100110010010100111011000101110010010101111010000011010110101101000001000101100101100101000101000001110000110101110001000001100110001100001001001010010000010100101001000011110010010100110110010000110101100000101000011100110110010100011011010110110100110101111010000100110,)
> (26,3928,1,252,1212837441,1232350037,1668772674,"(728910135,11093)",31284,16687,70,1111011010100010100010100100110011001100100000101101001001110110000101101110011001100110101101101001110010011110010010101000110011110100000100100010111011010100011010100001001010110110101100100011001000101010111011000000111011110100110011001011000001010000101101101111010010110110100001101000001010100010011011100100101010110110001100100101011001001110001101101100011001100110,1717791858)
> (29,2296,1,812,1229271346,1934308693,1733443957,"(1848337707,21358)",31337,21592,88,,1178686785)
> (30,1944,1,352,947155032,1380218998,792031842,"(1211650927,22579)",30585,20532,80,,)
> (31,1560,1,380,23935627,23935627,1,"(18722,31)",40976,9510,24,,)
> (31 rows)
>
>
> The data for index 29 has 1934308693 in the header (which is the multixactid
> reported in the error message when trying to retrieve that row).

Right. The ctids on items 21, 23, 24, 26,29 and 30 are pretty obviously
corrupted (page numbers are too large, and offset numbers are way above
normal offset numbers), as is probaly everything else about them. Note
xmin should be a valid xid, but for example on item 21 it's 133 million
which obviously is not a valid xid on your system. Items 26 and 29 even
have an OID, which the others do not (indicating wrong HEAP_HASOID
flag).

Note the lp_off fields. It seems that all valid items have offsets
above 4096, and corrupted items have offsets below that. The theory
that somehow the system wrote a bad filesystem page (4096 bytes) on the
lower half of the Postgres page (8192 bytes) sounds rather more probable
now.

> I'm struggling to find the definitions for the data returned by
> heap_page_items,

Try \df+ heap_page_items; or more easily, use this query instead:

select * from heap_page_items(get_raw_page('messages.msgdata', 18722)); gives:

> If I look at the raw page data, it contains data relating to this
> installation, so it doesn't look like a page from a different Hyper-V guest.
> It also doesn't look like just random data from a memory error. It actually
> looks like legitimate data which should possibly have been written
> elsewhere. (I don't want to post it here because it contains possibly
> confidential data from one of our customers).

Sure.

--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Martín Marqués 2015-05-26 22:01:19 Re: about lob(idea)
Previous Message Jeff Janes 2015-05-26 21:26:03 Re: Redesigning checkpoint_segments