(not a PG bug, please ignore) Re: PostgreSQL 11.3 hangs forever during DEALLOCATE statement

From: Tobias Gierke <tobias(dot)gierke(at)voipfuture(dot)com>
To: pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: (not a PG bug, please ignore) Re: PostgreSQL 11.3 hangs forever during DEALLOCATE statement
Date: 2019-06-11 14:35:43
Message-ID: 34189fc1-f8e8-69fd-d8d4-d6dd1a7a6d19@voipfuture.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Hi,

After more investigation it turned out that this was actually a deadlock
across two machines caused by a lock-ordering issue.

Sorry for the noise,
Tobias

> Hi,
>
> We have a distributed system (all CentOS 7 with PG 11.3 compiled from
> sources) that makes use of pg_fdw to forward changes to a local table
> to a remote server.
>
> Today we noticed that one process got stuck with wait_event_type
> 'Extension'. Attaching GDB to the process on the local server (see
> further below) showed that the extension in question was actually
> pg_fdw. Further inspection on the remote server showed that the
> connection was stuck in a DEALLOCATE call that never returned.
>
> *** Local server ***
>
> Blocked statement:
>
> voip=# select
> pid,wait_event_type,wait_event,datid,datname,backend_start,xact_start,query_start,state_change,backend_xid,backend_xmin,backend_type,substring(query,1,50)
> from pg_stat_activity where state <> 'idle';
>   pid  | wait_event_type | wait_event | datid | datname |
> backend_start         |          xact_start           |
> query_start          |         state_change          | backend_xid |
> backend_xmin |  backend_type  | substring
> -------+-----------------+------------+-------+---------+-------------------------------+-------------------------------+-------------------------------+-------------------
>
> ------------+-------------+--------------+----------------+----------------------------------------------------
>
>  25301 | Extension       | Extension  | 16689 | voip    | 2019-06-11
> 11:47:32.75992+02  | 2019-06-11 12:24:46.554381+02 | 2019-06-11
> 12:24:46.55486+02  | 2019-06-11 12:24:46.55486+02  | 12198484 |    
> 12198483 | client backend | UPDATE importer.config SET value='f' WHERE
> ....
>
>
> The table table updated in the query has a trigger function that
> synchronizes the changes to a remote table:
>
>
>                      Table "importer.config"
>   Column  |  Type  | Collation | Nullable | Default
> ----------+--------+-----------+----------+-------------------------------------------------
>
>  id       | bigint |           | not null |
> nextval('importer.config_new_id_seq'::regclass)
>  param    | text   |           | not null |
>  value    | text   |           |          |
>  hostname | text   |           | not null |
>  filetype | text   |           | not null |
> Indexes:
>     "config_param_hostname_filetype_key" UNIQUE CONSTRAINT, btree
> (param, hostname, filetype)
> Triggers:
>     importer_config_trigger AFTER INSERT OR DELETE OR UPDATE ON
> importer.config FOR EACH STATEMENT EXECUTE PROCEDURE
> notify_importer_config_change()
>     sync_importer_config AFTER INSERT OR DELETE OR UPDATE OR TRUNCATE
> ON importer.config FOR EACH STATEMENT EXECUTE PROCEDURE
> importer.trig_sync_importer_config()
>
> voip=# \dfS+ importer.trig_sync_importer_config()
>
> BEGIN
> DELETE FROM remote_qdc_importer.config;
> INSERT INTO remote_qdc_importer.config SELECT * FROM importer.config;
> RETURN NEW;
> END;
>
>
> Remote table definition:
>
> voip=# \d remote_qdc_importer.config;
>                   Foreign table "remote_qdc_importer.config"
>   Column  |  Type  | Collation | Nullable | Default |       FDW options
> ----------+--------+-----------+----------+---------+--------------------------
>
>  id       | bigint |           | not null |         | (column_name 'id')
>  param    | text   |           | not null |         | (column_name
> 'param')
>  value    | text   |           |          |         | (column_name
> 'value')
>  hostname | text   |           | not null |         | (column_name
> 'hostname')
>  filetype | text   |           | not null |         | (column_name
> 'filetype')
> Server: qdc
> FDW options: (schema_name 'importer', table_name 'config')
>
> Backtrace on the local server (PID 25301)
>
>
> #0  0x00007fe8a32cd463 in __epoll_wait_nocancel () from /lib64/libc.so.6
> (gdb) bt
> #0  0x00007fe8a32cd463 in __epoll_wait_nocancel () from /lib64/libc.so.6
> #1  0x000000000070c81e in WaitEventSetWaitBlock (nevents=1,
> occurred_events=0x7ffd7bfc0fc0, cur_timeout=-1, set=0x24f1900) at
> latch.c:1048
> #2  WaitEventSetWait (set=set(at)entry=0x24f1900,
> timeout=timeout(at)entry=-1,
> occurred_events=occurred_events(at)entry=0x7ffd7bfc0fc0,
> nevents=nevents(at)entry=1,
>     wait_event_info=wait_event_info(at)entry=117440512) at latch.c:1000
> #3  0x000000000070cc57 in WaitLatchOrSocket (latch=0x7fe8967e9fc4,
> wakeEvents=wakeEvents(at)entry=3, sock=26, timeout=timeout(at)entry=-1,
>     wait_event_info=wait_event_info(at)entry=117440512) at latch.c:385
> #4  0x00007fe89a9dbc03 in pgfdw_get_result (conn=0x2880170,
> query=0x2348c90 "DELETE FROM importer.config") at connection.c:548
> #5  0x00007fe89a9d5a9e in execute_dml_stmt (node=0x24d4710,
> node=0x24d4710) at postgres_fdw.c:3787
> #6  postgresIterateDirectModify (node=0x24d4710) at postgres_fdw.c:2553
> #7  0x000000000061fabc in ForeignNext (node=node(at)entry=0x24d4710) at
> nodeForeignscan.c:52
> #8  0x00000000005fe2e5 in ExecScanFetch (recheckMtd=0x61f9f0
> <ForeignRecheck>, accessMtd=0x61fa80 <ForeignNext>, node=0x24d4710) at
> execScan.c:95
> #9  ExecScan (node=0x24d4710, accessMtd=0x61fa80 <ForeignNext>,
> recheckMtd=0x61f9f0 <ForeignRecheck>) at execScan.c:162
> #10 0x00000000006186bb in ExecProcNode (node=0x24d4710) at
> ../../../src/include/executor/executor.h:247
> #11 ExecModifyTable (pstate=0x24d3db0) at nodeModifyTable.c:2025
> #12 0x00000000005f638a in ExecProcNode (node=0x24d3db0) at
> ../../../src/include/executor/executor.h:247
> #13 ExecutePlan (execute_once=<optimized out>, dest=0xccd060
> <spi_printtupDR>, direction=<optimized out>, numberTuples=0,
> sendTuples=false, operation=CMD_DELETE,
>     use_parallel_mode=<optimized out>, planstate=0x24d3db0,
> estate=0x24d3850) at execMain.c:1723
> #14 standard_ExecutorRun (queryDesc=0x2aa39d0, direction=<optimized
> out>, count=0, execute_once=<optimized out>) at execMain.c:364
> #15 0x00007fe89b23ff85 in pgss_ExecutorRun (queryDesc=0x2aa39d0,
> direction=ForwardScanDirection, count=0, execute_once=<optimized out>)
> at pg_stat_statements.c:892
> #16 0x0000000000626969 in _SPI_pquery (tcount=0, fire_triggers=true,
> queryDesc=0x2aa39d0) at spi.c:2501
> #17 _SPI_execute_plan (plan=plan(at)entry=0x1e73820,
> paramLI=paramLI(at)entry=0x0, snapshot=snapshot(at)entry=0x0,
> crosscheck_snapshot=crosscheck_snapshot(at)entry=0x0,
>     read_only=read_only(at)entry=false,
> fire_triggers=fire_triggers(at)entry=true, tcount=tcount(at)entry=0) at
> spi.c:2263
> #18 0x0000000000626d68 in SPI_execute_plan_with_paramlist
> (plan=0x1e73820, params=0x0, read_only=<optimized out>, tcount=0) at
> spi.c:540
> #19 0x00007fe80c6b16ef in exec_stmt_execsql
> (estate=estate(at)entry=0x7ffd7bfc18e0, stmt=stmt(at)entry=0x1e67470) at
> pl_exec.c:4115
> #20 0x00007fe80c6b2deb in exec_stmt (stmt=0x1e67470,
> estate=0x7ffd7bfc18e0) at pl_exec.c:1986
> #21 exec_stmts (estate=0x7ffd7bfc18e0, stmts=<optimized out>) at
> pl_exec.c:1877
> #22 0x00007fe80c6b4f42 in exec_stmt_block
> (estate=estate(at)entry=0x7ffd7bfc18e0, block=0x1e70e20) at pl_exec.c:1818
> #23 0x00007fe80c6b5690 in plpgsql_exec_trigger
> (func=func(at)entry=0x1cb2b98, trigdata=0x7ffd7bfc20f0) at pl_exec.c:958
> #24 0x00007fe80c6a9382 in plpgsql_call_handler (fcinfo=0x7ffd7bfc1c70)
> at pl_handler.c:254
> #25 0x00000000005d28c8 in ExecCallTriggerFunc
> (trigdata=trigdata(at)entry=0x7ffd7bfc20f0, tgindx=tgindx(at)entry=1,
> finfo=0x2bac7c0, finfo(at)entry=0x2bac790,
>     instr=instr(at)entry=0x0,
> per_tuple_context=per_tuple_context(at)entry=0x27e73a0) at trigger.c:2412
> #26 0x00000000005d3f1d in AfterTriggerExecute (trigdesc=0x2bac5d0,
> trigdesc=0x2bac5d0, trig_tuple_slot2=0x0, trig_tuple_slot1=0x0,
> per_tuple_context=0x27e73a0, instr=0x0,
>     finfo=0x2bac790, rel=0x1d1f5f8, event=0x2a6127c) at trigger.c:4359
> #27 afterTriggerInvokeEvents (events=events(at)entry=0x2c28790,
> firing_id=1, estate=estate(at)entry=0x2bac280,
> delete_ok=delete_ok(at)entry=false) at trigger.c:4550
> #28 0x00000000005d9397 in AfterTriggerEndQuery
> (estate=estate(at)entry=0x2bac280) at trigger.c:4860
> #29 0x00000000005f6609 in standard_ExecutorFinish
> (queryDesc=0x2c2f6a0) at execMain.c:439
> #30 0x00007fe89b23fe95 in pgss_ExecutorFinish (queryDesc=0x2c2f6a0) at
> pg_stat_statements.c:915
> #31 0x000000000072e6e8 in ProcessQuery (plan=<optimized out>,
>     sourceText=0x2c2f610 "UPDATE importer.config SET value='f' WHERE
> param='active' AND hostname = 'dc-int' AND filetype='ssa5'",
> params=0x0, queryEnv=0x0,
>     dest=0xccd120 <donothingDR>, completionTag=0x7ffd7bfc2600 "UPDATE
> 1") at pquery.c:205
> #32 0x000000000072e8c2 in PortalRunMulti
> (portal=portal(at)entry=0x1d0f6a0, isTopLevel=isTopLevel(at)entry=true,
> setHoldSnapshot=setHoldSnapshot(at)entry=false,
>     dest=0xccd120 <donothingDR>, dest(at)entry=0x1c88a70,
> altdest=0xccd120 <donothingDR>, altdest(at)entry=0x1c88a70,
>     completionTag=completionTag(at)entry=0x7ffd7bfc2600 "UPDATE 1") at
> pquery.c:1286
> #33 0x000000000072f38d in PortalRun (portal=portal(at)entry=0x1d0f6a0,
> count=count(at)entry=1, isTopLevel=isTopLevel(at)entry=true,
> run_once=<optimized out>,
>     dest=dest(at)entry=0x1c88a70, altdest=altdest(at)entry=0x1c88a70,
> completionTag=completionTag(at)entry=0x7ffd7bfc2600 "UPDATE 1") at
> pquery.c:799
> #34 0x000000000072ce8d in exec_execute_message (max_rows=1,
> portal_name=0x1c88660 "") at postgres.c:2036
> #35 PostgresMain (argc=<optimized out>, argv=argv(at)entry=0x1cb1358,
> dbname=0x1cb1148 "voip", username=<optimized out>) at postgres.c:4245
> #36 0x000000000047e313 in BackendRun (port=0x1caedd0) at
> postmaster.c:4358
> #37 BackendStartup (port=0x1caedd0) at postmaster.c:4030
> #38 ServerLoop () at postmaster.c:1707
>
> *** Remote server ***
>
>
> postgres=# select
> pid,wait_event_type,wait_event,datid,datname,backend_start,xact_start,query_start,state_change,backend_xid,backend_xmin,backend_type,substring(query,1,50)
> from pg_stat_activity where state <> 'idle';
>   pid  | wait_event_type |  wait_event   | datid |    datname
> |         backend_start         |          xact_start |
> query_start          |         s
> tate_change          | backend_xid | backend_xmin |  backend_type
> |                     substring
> -------+-----------------+---------------+-------+---------------+-------------------------------+-------------------------------+-------------------------------+----------
>
> ---------------------+-------------+--------------+----------------+----------------------------------------------------
>
>  25536 | Lock            | transactionid | 16401 | datacollector |
> 2019-06-11 12:24:46.567576+02 | 2019-06-11 12:24:46.607314+02 |
> 2019-06-11 12:24:46.615009+02 | 2019-06-11 12:24:46.61501+02 | 4292464
> |      4292463 | client backend | DELETE FROM importer.config
>  10183 | Client          | ClientRead    | 16401 | datacollector |
> 2019-06-11 11:58:09.340121+02 | 2019-06-11 12:24:45.870476+02 |
> 2019-06-11 12:24:46.711237+02 | 2019-06-11 12:24:46.711292+02 |    
> 4292463 |      4292463 | client backend | DEALLOCATE pgsql_fdw_prep_178
> (3 rows)
>
> postgres=# select * from pg_locks where pid=25536 and granted=false;
>    locktype    | database | relation | page | tuple | virtualxid |
> transactionid | classid | objid | objsubid | virtualtransaction | pid 
> |   mode    | granted | fastpath
> ---------------+----------+----------+------+-------+------------+---------------+---------+-------+----------+--------------------+-------+-----------+---------+----------
>
>  transactionid |          |          |      |       | | 4292463
> |         |       |          | 52/155384 | 25536 | ShareLock | f      
> | f
> (1 row)
>
> postgres=# select * from pg_locks where transactionid=4292463;
>    locktype    | database | relation | page | tuple | virtualxid |
> transactionid | classid | objid | objsubid | virtualtransaction | pid 
> |     mode      | granted | fastp
> ath
> ---------------+----------+----------+------+-------+------------+---------------+---------+-------+----------+--------------------+-------+---------------+---------+------
>
> ----
>  transactionid |          |          |      |       | | 4292463
> |         |       |          | 52/155384 | 25536 | ShareLock     |
> f       | f
>  transactionid |          |          |      |       | | 4292463
> |         |       |          | 56/28816 | 10183 | ExclusiveLock |
> t       | f
> (2 rows)
>
>
> postgres=# select * from pg_locks where pid=10183;
>    locktype    | database | relation | page | tuple | virtualxid |
> transactionid | classid | objid | objsubid | virtualtransaction | pid 
> |       mode       | granted | fa
> stpath
> ---------------+----------+----------+------+-------+------------+---------------+---------+-------+----------+--------------------+-------+------------------+---------+---
>
> -------
>  relation      |    16401 |    16874 |      |       | |              
> |         |       |          | 56/28816 | 10183 | RowExclusiveLock |
> t       | t
>  virtualxid    |          |          |      |       | 56/28816
> |               |         |       |          | 56/28816 | 10183 |
> ExclusiveLock    | t       | t
>  transactionid |          |          |      |       | | 4292463
> |         |       |          | 56/28816 | 10183 | ExclusiveLock    |
> t       | f
>  relation      |    16401 |    16867 |      |       | |              
> |         |       |          | 56/28816 | 10183 | RowExclusiveLock |
> t       | f
> (4 rows)
>
>
>
> (gdb) bt
> #0  0x00007f71bb99d463 in __epoll_wait_nocancel () from /lib64/libc.so.6
> #1  0x000000000070c81e in WaitEventSetWaitBlock (nevents=1,
> occurred_events=0x7ffdde17a910, cur_timeout=-1, set=0x22b50e8) at
> latch.c:1048
> #2  WaitEventSetWait (set=0x22b50e8, timeout=timeout(at)entry=-1,
> occurred_events=occurred_events(at)entry=0x7ffdde17a910,
> nevents=nevents(at)entry=1,
>     wait_event_info=wait_event_info(at)entry=100663296) at latch.c:1000
> #3  0x000000000062d4d3 in secure_read (port=0x22b2ed0, ptr=0xcda640
> <PqRecvBuffer>, len=8192) at be-secure.c:173
> #4  0x00000000006364b8 in pq_recvbuf () at pqcomm.c:963
> #5  0x0000000000637085 in pq_getbyte () at pqcomm.c:1006
> #6  0x000000000072c4e2 in SocketBackend (inBuf=0x7ffdde17aa80) at
> postgres.c:341
> #7  ReadCommand (inBuf=0x7ffdde17aa80) at postgres.c:514
> #8  PostgresMain (argc=<optimized out>, argv=argv(at)entry=0x22b5348,
> dbname=0x22b5218 "datacollector", username=<optimized out>) at
> postgres.c:4124
> #9  0x000000000047e313 in BackendRun (port=0x22b2ed0) at
> postmaster.c:4358
> #10 BackendStartup (port=0x22b2ed0) at postmaster.c:4030
> #11 ServerLoop () at postmaster.c:1707
> #12 0x00000000006c3719 in PostmasterMain (argc=argc(at)entry=5,
> argv=argv(at)entry=0x22842d0) at postmaster.c:1380
> #13 0x000000000047ed8b in main (argc=5, argv=0x22842d0) at main.c:228
>
> Any advice on how to debug this further ?
>
> Regards,
> Tobias
>
--
Tobias Gierke
Software Developer

Voipfuture GmbH Wendenstr. 4 20097 Hamburg Germany
Phone +49 40 688 9001 64 Fax +49 40 688 9001 99 www.voipfuture.com
Managing Directors Jan Bastian Eyal Ullert
Commercial Court AG Hamburg HRB 109896 VAT ID DE263738086

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message Tom Lane 2019-06-11 17:35:34 Re: inconsistent behaviour of json_to_record and friends with embedded json
Previous Message Alvaro Herrera 2019-06-11 13:32:55 Re: BUG #15833: defining a comment on a domain constraint fails with wrong OID