Re: BUG #8673: Could not open file "pg_multixact/members/xxxx" on slave during hot_standby

From: Serge Negodyuck <petr(at)petrovich(dot)kiev(dot)ua>
To: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #8673: Could not open file "pg_multixact/members/xxxx" on slave during hot_standby
Date: 2014-06-10 09:26:27
Message-ID: CABKyZDE9casPHmbTtbGiNAzmd68Muh3i=KTgLd9sOVpU+_v+PA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs pgsql-hackers

2014-06-09 22:49 GMT+03:00 Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>:
>
> Pushed a fix for this. Thanks for the report.
Thank you!

>
> > 2014-06-02 08:22:30 EEST FATAL: could not access status of transaction
> > 2080547
> > 2014-06-02 08:22:30 EEST DETAIL: Could not read from file
> > "pg_multixact/members/14078" at offset 24576: Success.
> > 2014-06-02 08:22:30 EEST CONTEXT: xlog redo create mxid 2080547 offset
> > 4294961608 nmembers 8684: 6193231 (keysh) 6193233 (fornokeyupd) 6193234
> > (keysh) 6193235 (fornokeyupd) 6193236 (keysh) 6193237 (fornokeyupd) 6193238
> > (keysh) 6193239 (fornokeyupd) 6193240 (keysh) 6193241 (fornokeyupd) 6193242
> > (keysh) 6193243 (fornokeyupd) 6193244 (keysh) 6193245 (fornokeyupd) 6193246
> > (keysh) 6193247 (fornokeyupd) 6193248 (keysh) 6193249 (fornokeyupd) 6193250
> > (keysh) 6193251 (fornokeyupd) 6193252 (keysh) 6193253 (fornokeyupd) 6193254
> > (keysh) 6193255 (fornokeyupd) 6193256 (keysh) 6193257 .......
>
> I find this bit rather odd. Normally the system shouldn't create
> multixacts this large. I think we might be missing a trick here somewhere.
> I imagine inserting the last few items is slow, isn't it?

Yes, the duration of inserts have been growing up to 2.2 seconds before crash:
2014-06-02 08:20:11 EEST 172.18.10.4 db LOG: duration: 2213.361 ms
statement: INSERT INTO product (...) VALUES (...) RETURNING product.id

Normally inserts fit in to 100ms (log_min_duration_statement)
The same log "xlog redo create mxid 2080547...." was present on master
and both replica servers. Well, this sounds logical.

>
> > An ugly hack "cp pg_multixact/members/14077 pg_multixact/members/14078"
> > helped me to start master server in replica.
> >
> >
> > Then, did pg_basebackup to slave database. It does not help
> > 2014-06-02 09:58:49 EEST 172.18.10.17 db2 DETAIL: Could not open file
> > "pg_multixact/members/1112D": No such file or directory.
> > 2014-06-02 09:58:49 EEST 172.18.10.18 db2 DETAIL: Could not open file
> > "pg_multixact/members/11130": No such file or directory.
> > 2014-06-02 09:58:51 EEST 172.18.10.34 db2 DETAIL: Could not open file
> > "pg_multixact/members/11145": No such file or directory.
> > 2014-06-02 09:58:51 EEST 172.18.10.38 db2 DETAIL: Could not open file
> > "pg_multixact/members/13F76": No such file or directory
>
> This is strange also; if the files are present in master, how come they
> weren't copied to the replica? I think we need more info about this
> problem.
I've thoroughly looked through the logs once again and I have not
found anything interesting.
I just know there were very few pg_multixact/members files starting
from 0000. It was on both slave servers. So I've observed this issue
two times.

To fix it I had to do pg_dumpall | pg_restore on master.
So, I'm sorry, I have no additional info about this problem.

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Geoff Speicher 2014-06-10 13:42:34 Re: BUG #10587: ERROR: variable not found in subplan target list
Previous Message zsoros 2014-06-10 08:19:36 BUG #10589: hungarian.stop file spelling error

Browse pgsql-hackers by date

  From Date Subject
Next Message MauMau 2014-06-10 09:57:33 Re: [bug fix] Memory leak in dblink
Previous Message furuyao 2014-06-10 09:04:13 Re: pg_xlogdump --stats