Re: PostgreSQL's processes blocking each other are not detected as deadlock

From: Sorin Mircioiu <sorin(dot)mircioiu(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: PostgreSQL's processes blocking each other are not detected as deadlock
Date: 2023-09-27 08:32:45
Message-ID: CAO-WFfnB2EMFRoAd+Hyaqmmw_OHFkpN2eV==hA26HCAay-BadQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Thank you for the answer.

I have not yet identified the current statements these functions are
executing.

select pid,
usename,
pg_blocking_pids(pid) as blocked_by,
query as blocked_query
from pg_stat_activity
where cardinality(pg_blocking_pids(pid)) > 0;

pid | usename | blocked_by | blocked_query
-------+------------+------------+-------------------------------
13502 | my_username | {21544} | SELECT * FROM my_procedure_1()
21544 | my_username | {13502} | SELECT * FROM my_procedure_2()
(2 rows)

The manual says about pg_blocking_pids:
1. One server process blocks another if it either holds a lock that
conflicts with the blocked process's lock request (hard block),
2. Or is waiting for a lock that would conflict with the blocked
process's lock request and is ahead of it in the wait queue (soft block).

Based on what you said I presume that pg_blocking_pids shows in my
situation that it is waiting for a lock.

SELECT * FROM pg_locks WHERE pid NOT IN (13502, 21544, (SELECT
pg_backend_pid()));
returns zero rows so my understanding is that there are no other locks from
other processes.

SELECT pid FROM pg_stat_activity WHERE state != 'idle' AND pid NOT IN
(13502,21544, (SELECT pg_backend_pid()));
returns zero rows so there are no other running processes.

Still, if there are no other running processes what could block a process
to wait for a lock ?

Below I present a gcore log. Both processes show the same output.

#0 0x00007f3569b9176a in epoll_wait () from /lib64/libc.so.6
#1 0x000055f4c3124471 in WaitEventSetWait ()
#2 0x000055f4c3124872 in WaitLatchOrSocket ()
#3 0x000055f4c313cc8b in ProcSleep ()
#4 0x000055f4c3132c3b in WaitOnLock ()
#5 0x000055f4c3132fa9 in LockAcquireExtended ()
#6 0x000055f4c3133ef6 in XactLockTableWait ()
#7 0x000055f4c2e49dcf in heap_update ()
#8 0x000055f4c2e51451 in heapam_tuple_update.lto_priv ()
#9 0x000055f4c2fee3cf in ExecUpdate ()
#10 0x000055f4c2fefce4 in ExecModifyTable ()
#11 0x000055f4c2fbff42 in standard_ExecutorRun ()
#12 0x000055f4c3147ec5 in ProcessQuery ()
#13 0x000055f4c3149c26 in PortalRunMulti ()
#14 0x000055f4c3149dba in FillPortalStore ()
#15 0x000055f4c314a424 in PortalRunFetch ()
#16 0x000055f4c2ffe536 in _SPI_cursor_operation.lto_priv.0 ()
#17 0x00007f35415d8145 in exec_for_query () from /usr/lib64/pgsql/plpgsql.so
#18 0x00007f35415d8be6 in exec_stmt () from /usr/lib64/pgsql/plpgsql.so
#19 0x00007f35415db55e in exec_stmts () from /usr/lib64/pgsql/plpgsql.so
#20 0x00007f35415da5fc in exec_stmt () from /usr/lib64/pgsql/plpgsql.so
#21 0x00007f35415db55e in exec_stmts () from /usr/lib64/pgsql/plpgsql.so
#22 0x00007f35415da5fc in exec_stmt () from /usr/lib64/pgsql/plpgsql.so
#23 0x00007f35415db55e in exec_stmts () from /usr/lib64/pgsql/plpgsql.so
#24 0x00007f35415da5fc in exec_stmt () from /usr/lib64/pgsql/plpgsql.so
#25 0x00007f35415db55e in exec_stmts () from /usr/lib64/pgsql/plpgsql.so
#26 0x00007f35415db8cc in exec_stmt_block () from
/usr/lib64/pgsql/plpgsql.so
#27 0x00007f35415d9b6b in exec_stmt () from /usr/lib64/pgsql/plpgsql.so
#28 0x00007f35415dab81 in plpgsql_exec_function () from
/usr/lib64/pgsql/plpgsql.so
#29 0x00007f35415e323b in plpgsql_call_handler () from
/usr/lib64/pgsql/plpgsql.so
#30 0x000055f4c2fb6863 in ExecInterpExpr ()
#31 0x000055f4c2fef60a in ExecResult ()
#32 0x000055f4c2fbff42 in standard_ExecutorRun ()
#33 0x000055f4c2ffdd2d in _SPI_execute_plan ()
#34 0x000055f4c2ffe126 in SPI_execute_plan_with_paramlist ()
#35 0x00007f35415d7c7c in exec_stmt_execsql () from
/usr/lib64/pgsql/plpgsql.so
#36 0x00007f35415d865b in exec_stmt () from /usr/lib64/pgsql/plpgsql.so
#37 0x00007f35415db55e in exec_stmts () from /usr/lib64/pgsql/plpgsql.so
#38 0x00007f35415db8cc in exec_stmt_block () from
/usr/lib64/pgsql/plpgsql.so
#39 0x00007f35415d9b6b in exec_stmt () from /usr/lib64/pgsql/plpgsql.so
#40 0x00007f35415dab81 in plpgsql_exec_function () from
/usr/lib64/pgsql/plpgsql.so
#41 0x00007f35415e323b in plpgsql_call_handler () from
/usr/lib64/pgsql/plpgsql.so
#42 0x000055f4c2fcc7da in ExecMakeTableFunctionResult ()
#43 0x000055f4c2fdd9fa in FunctionNext ()
#44 0x000055f4c2fbff42 in standard_ExecutorRun ()
#45 0x000055f4c3148375 in PortalRunSelect ()
#46 0x000055f4c314a0a6 in PortalRun ()
#47 0x000055f4c31453d3 in exec_simple_query.lto_priv ()
#48 0x000055f4c314c9ce in PostgresMain ()
#49 0x000055f4c30cdcdf in ServerLoop ()
#50 0x000055f4c30ceb57 in PostmasterMain ()
#51 0x000055f4c2e04171 in main ()

On Tue, Sep 26, 2023 at 5:02 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> Sorin Mircioiu <sorin(dot)mircioiu(at)gmail(dot)com> writes:
> > I need answers to the following questions:
>
> > 1. 13502 is blocked by 21544 and 21544 is blocked by 13502. Why
> PostgreSQL
> > didn't detect this situation as a deadlock ?
>
> You have not demonstrated that these two processes are waiting for
> each other, only that they are waiting for something. The subset
> of data you presented is insufficient to conclude that a deadlock
> should have been reported. It does appear that some progress
> has been made since the previous deadlock, since neither process
> is waiting for the same thing it was waiting on as of the deadlock
> report.
>
> In the past we've seen cases where a deadlock existed but the
> server could not detect it because one step in the waits-for
> loop was implemented by client-side behavior rather than a
> server-visible lock request. Not saying that's happened here,
> but you might want to consider your application's behavior
> with that idea in mind.
>
> > why is ShareLock used for the locking system in my situation ?
>
> ShareLock on an XID is a different thing from ShareLock on a table.
> IIRC we request ShareLock on an XID as a way of waiting for the
> transaction using that XID to terminate, which is a need that only
> comes up in some edge cases like CREATE INDEX CONCURRENTLY. Have you
> identified exactly what statements those functions are executing?
>
> regards, tom lane
>

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message Patrick Frießnegg 2023-09-27 08:43:22 PG_DUMPALL not executable within docker container
Previous Message Laurenz Albe 2023-09-27 08:04:24 Re: md5 password valid and invalid after upgrading