Re: POC: Cleaning up orphaned files using undo logs

From: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Amit Khandekar <amitdkhan(dot)pg(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, Kuntal Ghosh <kuntalghosh(dot)2007(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: POC: Cleaning up orphaned files using undo logs
Date: 2019-07-30 11:56:13
Message-ID: CA+hUKGLv016-1y=Cwx+mme+cFRD5Bn03=2JVFnRB7JMLsA35=w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Amit

I've been testing some undo worker workloads (more on that soon), but
here's a small thing: I managed to reach an LWLock self-deadlock in
the undo worker launcher:

diff --git a/src/backend/access/undo/undorequest.c
b/src/backend/access/undo/undorequest.c
...
+bool
+UndoGetWork(bool allow_peek, bool remove_from_queue, UndoRequestInfo *urinfo,
...
+ /* Search the queues under lock as they can be modified concurrently. */
+ LWLockAcquire(RollbackRequestLock, LW_EXCLUSIVE);
...
+ RollbackHTRemoveEntry(rh->full_xid,
rh->start_urec_ptr);

^ but that function acquires the same lock, leading to:

(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGSTOP
* frame #0: 0x00007fff5d110106 libsystem_kernel.dylib`semop + 10
frame #1: 0x0000000104bbf24c
postgres`PGSemaphoreLock(sema=0x000000010e216a08) at pg_sema.c:428:15
frame #2: 0x0000000104c90186
postgres`LWLockAcquire(lock=0x000000010e218300, mode=LW_EXCLUSIVE) at
lwlock.c:1246:4
frame #3: 0x000000010487463d
postgres`RollbackHTRemoveEntry(full_xid=(value = 89144),
start_urec_ptr=20890721090967) at undorequest.c:1717:2
frame #4: 0x0000000104873dbe
postgres`UndoGetWork(allow_peek=false, remove_from_queue=false,
urinfo=0x00007ffeeb4d3e30, in_other_db_out=0x0000000000000000) at
undorequest.c:1388:5
frame #5: 0x0000000104876211 postgres`UndoLauncherMain(main_arg=0)
at undoworker.c:607:7
...
(lldb) print held_lwlocks[0]
(LWLockHandle) $0 = {
lock = 0x000000010e218300
mode = LW_EXCLUSIVE
}

--
Thomas Munro
https://enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Masahiko Sawada 2019-07-30 12:14:03 Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)
Previous Message Sehrope Sarkuni 2019-07-30 11:44:20 Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)