Re: BUG #7546: Backups on hot standby cancelled despite hot_standby_feedback=on

From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Stuart Bishop <stuart(at)stuartbishop(dot)net>
Cc: pgsql-bugs(at)postgresql(dot)org, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Subject: Re: BUG #7546: Backups on hot standby cancelled despite hot_standby_feedback=on
Date: 2012-09-19 11:56:11
Message-ID: 201209191356.12176.andres@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Wednesday, September 19, 2012 11:33:29 AM Stuart Bishop wrote:
> On Wed, Sep 19, 2012 at 3:38 PM, Andres Freund <andres(at)2ndquadrant(dot)com>
wrote:
> >> >> ERROR: canceling statement due to conflict with recovery
> >> >> DETAIL: User was holding shared buffer pin for too long.
> >> >
> >> > Maybe I'm missing something, but ISTM that hot_standby_feedback
> >> > doesn't eliminate the query cancels caused by buffer pin lock which
> >> > you encountered. It eliminates only the query cancels caused by
> >> > cleanup of rows. So you might need to set max_standby_streaming_delay
> >> > to -1, to avoid query cancels.
> >
> > I think the above is just a case of the backend waiting a long time to
> > send data out due to your rate limiting. Why it does that holding a
> > buffer pin may be worth investigating...
>
> I put the rate limiting in there to trigger the problem consistently
> with a small data set. On the main production server I see this
> problem, it normally fails dumping a table with nearly 2 billion rows
> to local disk (pg_dump -Fc). The trick seems to be that vacuum needs
> to start running on the primary after the dump starts on the hot
> standby, and the vacuum complete before the dump finishes.
You see buffer pin errors or you see the relation errors? Those arent the same.
So if you see no buffer pin errors but relation lock ones in production your
way to reproduce the issue isn't telling us very much ;)

> >> I have also seen the following (but only on production, not with my test
> >> case):
> >>
> >> ERROR: canceling statement due to conflict with recovery
> >> DETAIL: User was holding a relation lock for too long.
> >>
> >> If you are correct, it seems impossible with 9.1 to have reliable long
> >> transactions on a hot standby without allowing the hot standby to lag
> >> behind by the length of the longest transaction.
> >
> > Is it possible that you perform such commands on tables used on the
> > standby?
>
> Yes, it is quite possible CREATE INDEX CONCURRENTLY was run on the
> primary during the backup window. I can't discount other commands
> either, but if that is the case one of our ops needs a spanking. I've
> only been able to reproduce the buffer pin error in isolation.
Do you have DDL logging enabled so you could investigate that?

I really wonder why we don't bother logging the dbOid/relOid pair before we
cancel transactions due to lock conflicts. That should make investigating such
issues considerably easier...

Greetings,

Andres
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message pathak.anurag 2012-09-19 13:21:03 BUG #7558: Postgres service not starting.
Previous Message zimnyx 2012-09-19 11:09:57 BUG #7557: Transaction sees outdated grants.