RE: [PATCH] Fix Proposal - Deadlock Issue in Single User Mode When IO Failure Occurs

From: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
To: Chengchao Yu <chengyu(at)microsoft(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Prabhat Tripathi <ptrip(at)microsoft(dot)com>, Sunil Kamath <Sunil(dot)Kamath(at)microsoft(dot)com>, Michal Primke <mprimke(at)microsoft(dot)com>, TEJA Mupparti <Tejeswar(dot)Mupparti(at)microsoft(dot)com>
Subject: RE: [PATCH] Fix Proposal - Deadlock Issue in Single User Mode When IO Failure Occurs
Date: 2019-07-24 07:08:54
Message-ID: CAKPRHzKsj0J0Q51L1SF-G7OmkgkXyifp1Aqe7T--S1gK0W7anQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Sorry in advance for link-breaking message force by gmail..

https://www.postgresql.org/message-id/flat/CY4PR2101MB0804CE9836E582C0702214E8AAD30(at)CY4PR2101MB0804(dot)namprd21(dot)prod(dot)outlook(dot)com

I assume that we are in a consensus about the problem we are to fix
here.

> 0a 00000004`8080cc30 00000004`80dcf917 postgres!PGSemaphoreLock+0x65 [d:\orcasqlagsea10\14\s\src\backend\port\win32_sema.c @ 158]
> 0b 00000004`8080cc90 00000004`80db025c postgres!LWLockAcquire+0x137 [d:\orcasqlagsea10\14\s\src\backend\storage\lmgr\lwlock.c @ 1234]
> 0c 00000004`8080ccd0 00000004`80db25db postgres!AbortBufferIO+0x2c [d:\orcasqlagsea10\14\s\src\backend\storage\buffer\bufmgr.c @ 3995]
> 0d 00000004`8080cd20 00000004`80dbce36 postgres!AtProcExit_Buffers+0xb [d:\orcasqlagsea10\14\s\src\backend\storage\buffer\bufmgr.c @ 2479]
> 0e 00000004`8080cd50 00000004`80dbd1bd postgres!shmem_exit+0xf6 [d:\orcasqlagsea10\14\s\src\backend\storage\ipc\ipc.c @ 262]
> 0f 00000004`8080cd80 00000004`80dbccfd postgres!proc_exit_prepare+0x4d [d:\orcasqlagsea10\14\s\src\backend\storage\ipc\ipc.c @ 188]
> 10 00000004`8080cdb0 00000004`80ef9e74 postgres!proc_exit+0xd [d:\orcasqlagsea10\14\s\src\backend\storage\ipc\ipc.c @ 141]
> 11 00000004`8080cde0 00000004`80ddb6ef postgres!errfinish+0x204 [d:\orcasqlagsea10\14\s\src\backend\utils\error\elog.c @ 624]
> 12 00000004`8080ce50 00000004`80db0f59 postgres!mdread+0x12f [d:\orcasqlagsea10\14\s\src\backend\storage\smgr\md.c @ 806]

Ok, we are fixing this. The proposed patch lets LWLockReleaseAll()
called before InitBufferPoolBackend() by registering the former after
the latter into on_shmem_exit list. Even if it works, I think it's
neither clean nor safe to register multiple order-sensitive callbacks.

AtProcExit_Buffers has the following comment:

> * During backend exit, ensure that we released all shared-buffer locks and
> * assert that we have no remaining pins.

And the only caller of it is shmem_exit. More of that, all other
caller sites calls LWLockReleaseAll() just before calling it. If
that's the case, why don't we just release all LWLocks in shmem_exit
or in AtProcExit_Buffers before calling AbortBufferIO()? I think it's
sufficient that AtProcExit_Buffers calls it at the beginning. (The
comment for the funcgtion needs editing).

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

Browse pgsql-hackers by date

  From Date Subject
Next Message David Rowley 2019-07-24 07:14:35 Re: Speed up transaction completion faster after many relations are accessed in a transaction
Previous Message Dilip Kumar 2019-07-24 06:49:17 Re: POC: Cleaning up orphaned files using undo logs