Help troubleshooting SubtransControlLock problems

From: Scott Frazer <sfrazer(at)couponcabin(dot)com>
To: pgsql-general(at)lists(dot)postgresql(dot)org
Subject: Help troubleshooting SubtransControlLock problems
Date: 2018-03-07 03:24:18
Message-ID: CA+ey=amBhfD4Ascc4yyoKRbh+FUx6wwMkujsZ7Ou+xOY2AwsSg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hi, we have a Postgres 9.6 setup using replication that has recently
started seeing a lot of processes stuck in "SubtransControlLock" as a
wait_event on the read-replicas. Like this, only usually about 300-800 of
them:

179706 | LWLockNamed | SubtransControlLock

186602 | LWLockNamed | SubtransControlLock

186606 | LWLockNamed | SubtransControlLock

180947 | LWLockNamed | SubtransControlLock

186621 | LWLockNamed | SubtransControlLock

The server then begins to crawl, with some queries just never finishing
until I finally shut the server down.

Searching for that particular combo of wait_event_type and wait_event only
seems to turn up the page about statistics collection, but no helpful
information on troubleshooting this lock.

Restarting the replica server clears the locks and allows us to start
working again, but it's happened twice now in 12 hours and I'm worried it
will happen again.

Does anyone have any advice on where to start looking?

Thanks,
Scott

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Andres Freund 2018-03-07 03:29:52 Re: dirty_ratio & dirty_background_ratio settings with huge memory
Previous Message pinker 2018-03-07 02:05:52 Re: dirty_ratio & dirty_background_ratio settings with huge memory