Re: Deadlocks in HS (on 9.0 :( )

From: Noah Misch <noah(at)leadboat(dot)com>
To: Greg Stark <stark(at)mit(dot)edu>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Deadlocks in HS (on 9.0 :( )
Date: 2014-07-16 04:25:55
Message-ID: 20140716042555.GA2165511@tornado.leadboat.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Jul 15, 2014 at 04:54:05PM +0100, Greg Stark wrote:
> We've observed a 9.0 database have undetected deadlocks repeatedly in
> hot standby mode.
>
> I think what's happening is that autovacuum is kicking off a VACUUM of
> some system catalogs -- it seems to usually be pg_statistics' toast
> table actually. At the end of the vacuum it briefly gets the exclusive
> lock to truncate the table. On the standby it replays that and records
> the exclusive lock being taken. It then sees a cleanup record that
> pauses replay because a HS standby transaction is running that can see
> the xid being cleaned up. That transaction then blocks against the
> exclusive lock and deadlocks against recovery.
>
> We expect upgrading to 9.3 to fix the problem for us due to the xid
> feedback mechanism. But is this still a known problem when feedback is
> not enabled?

This is the first I've heard of the problem.

> And is it a problem we should try to find a backpatchable
> fix for?

Yes. Undetected deadlock entirely within the confines of the system is a
clear bug, so let's back-patch if the fix proves suitable for that.

> I'm pondering whether we really need to log the exclusive lock taken
> by vacuum when truncating. Worst case is a scan is in progress,
> perhaps we can make scans understand how to handle tables that have
> been truncated concurrently? We could always make the truncate replay
> command acquire the lock and release it itself right away.

Perhaps so. Heikki had a broader design in that area:
http://www.postgresql.org/message-id/flat/5193AB47(dot)3070801(at)vmware(dot)com

The lock VACUUM takes before truncating a relation is the main (only?) source
of spontaneous recovery conflicts not addressed by hot_standby_feedback, so
any of the above would constitute a nice step forward.

--
Noah Misch
EnterpriseDB http://www.enterprisedb.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2014-07-16 04:28:34 Re: TODO : Allow parallel cores to be used by vacuumdb [ WIP ]
Previous Message Dilip kumar 2014-07-16 03:57:37 Re: TODO : Allow parallel cores to be used by vacuumdb [ WIP ]