Re: BUG #8673: Could not open file "pg_multixact/members/xxxx" on slave during hot_standby

From: Serge Negodyuck <petr(at)petrovich(dot)kiev(dot)ua>
To: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc: Andres Freund <andres(at)2ndquadrant(dot)com>, pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #8673: Could not open file "pg_multixact/members/xxxx" on slave during hot_standby
Date: 2013-12-18 14:20:48
Message-ID: CABKyZDEENX2X5HMqNtMB34zBAr2UZxrcs4SbXx169xDRKeZ4DA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs pgsql-hackers

2013/12/10 Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>:
> Andres Freund wrote:
>
>> > > I think problems should be preventable if you issue a systemwide VACUUM
>> > > FREEZE, but please let others chime in before you execute it.
>> >
>> > I wouldn't freeze anything just yet, at least until the patch to fix
>> > multixact freezing is in.
>>
>> Well, it seems better than getting errors because of multixact members
>> that are gone.
>> Maybe PGOPTIONS='-c vacuum_freez_table_age=0 -c vacuum_freeze_min_age=1000000 vacuumdb -a'
>> - that ought not to cause problems with current data and should freeze
>> enough to get rid of problematic multis?
>
> TBH I don't feel comfortable with predicting what will it freeze with
> the broken code.
>

You guys were right. After a week this issue occured again on almost
all slave servers.

slave:
2013-12-17 14:21:20 MSK CONTEXT: xlog redo delete: index
1663/16516/5320124; iblk 8764, heap 1663/16516/18816;
2013-12-17 14:21:20 MSK LOG: file "pg_clog/0370" doesn't exist,
reading as zeroes
2013-12-17 14:21:20 MSK FATAL: MultiXactId 1819308905 has not been
created yet -- apparent wraparound
2013-12-17 14:21:20 MSK CONTEXT: xlog redo delete: index
1663/16516/5320124; iblk 8764, heap 1663/16516/18816;
2013-12-17 14:21:20 MSK LOG: startup process (PID 13622) exited with exit code 1

I had to do fix something o master since all slaves were affected. So
the only idea was do perform VACUUM FREEZE on master.

I believe that was not a good idea. I suppose "vacuum freeze" leaded
to following errors on master:
2013-12-17 13:15:34 EET 172.18.10.44 ruprom ERROR: could not access
status of transaction 8407326
2013-12-17 13:15:34 EET 172.18.10.44 ruprom DETAIL: Could not open
file "pg_multixact/members/A458": No such file or directory.

The only way out was to perform full backup/restore, which did not
succeed with teh same error (could not access status of transaction
xxxxxxx)
A very ugly hack was to copy pg_multixact/members/0000 ->
pg_multixact/members/[ABCDF]xxx, it helped to do full backup, but not
sure about consistency of data.

My question is are there any quick-and-dirty solution to disable
pg_multixact deletion? I understand it may lead to waste of space.

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message David Fleischhauer 2013-12-18 15:01:52 Re: permission issues with PostgreSQL 9.2 EnterpriseDB one-click installer on windows 7 causes initcluster to fail
Previous Message Sandeep Thakkar 2013-12-18 07:58:13 Re: permission issues with PostgreSQL 9.2 EnterpriseDB one-click installer on windows 7 causes initcluster to fail

Browse pgsql-hackers by date

  From Date Subject
Next Message Stephen Frost 2013-12-18 14:45:24 Re: SQL objects UNITs (was: Extension Templates S03E11)
Previous Message Andrew Dunstan 2013-12-18 14:03:20 Re: [PATCH] SQL assertions prototype