PostgreSQL 11.3 hangs forever during DEALLOCATE statement

From: Tobias Gierke <tobias(dot)gierke(at)voipfuture(dot)com>
To: pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: PostgreSQL 11.3 hangs forever during DEALLOCATE statement
Date: 2019-06-11 11:49:24
Message-ID: 434d5a8d-cc48-bd4a-f839-d3f58df10ae8@voipfuture.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Hi,

We have a distributed system (all CentOS 7 with PG 11.3 compiled from
sources) that makes use of pg_fdw to forward changes to a local table to
a remote server.

Today we noticed that one process got stuck with wait_event_type
'Extension'. Attaching GDB to the process on the local server (see
further below) showed that the extension in question was actually
pg_fdw. Further inspection on the remote server showed that the
connection was stuck in a DEALLOCATE call that never returned.

*** Local server ***

Blocked statement:

voip=# select
pid,wait_event_type,wait_event,datid,datname,backend_start,xact_start,query_start,state_change,backend_xid,backend_xmin,backend_type,substring(query,1,50)
from pg_stat_activity where state <> 'idle';
  pid  | wait_event_type | wait_event | datid | datname |
backend_start         |          xact_start           |
query_start          |         state_change          | backend_xid |
backend_xmin |  backend_type  | substring
-------+-----------------+------------+-------+---------+-------------------------------+-------------------------------+-------------------------------+-------------------
------------+-------------+--------------+----------------+----------------------------------------------------
 25301 | Extension       | Extension  | 16689 | voip    | 2019-06-11
11:47:32.75992+02  | 2019-06-11 12:24:46.554381+02 | 2019-06-11
12:24:46.55486+02  | 2019-06-11 12:24:46.55486+02  | 12198484 |    
12198483 | client backend | UPDATE importer.config SET value='f' WHERE ....

The table table updated in the query has a trigger function that
synchronizes the changes to a remote table:

                     Table "importer.config"
  Column  |  Type  | Collation | Nullable | Default
----------+--------+-----------+----------+-------------------------------------------------
 id       | bigint |           | not null |
nextval('importer.config_new_id_seq'::regclass)
 param    | text   |           | not null |
 value    | text   |           |          |
 hostname | text   |           | not null |
 filetype | text   |           | not null |
Indexes:
    "config_param_hostname_filetype_key" UNIQUE CONSTRAINT, btree
(param, hostname, filetype)
Triggers:
    importer_config_trigger AFTER INSERT OR DELETE OR UPDATE ON
importer.config FOR EACH STATEMENT EXECUTE PROCEDURE
notify_importer_config_change()
    sync_importer_config AFTER INSERT OR DELETE OR UPDATE OR TRUNCATE
ON importer.config FOR EACH STATEMENT EXECUTE PROCEDURE
importer.trig_sync_importer_config()

voip=# \dfS+ importer.trig_sync_importer_config()

BEGIN
DELETE FROM remote_qdc_importer.config;
INSERT INTO remote_qdc_importer.config SELECT * FROM importer.config;
RETURN NEW;
END;

Remote table definition:

voip=# \d remote_qdc_importer.config;
                  Foreign table "remote_qdc_importer.config"
  Column  |  Type  | Collation | Nullable | Default |       FDW options
----------+--------+-----------+----------+---------+--------------------------
 id       | bigint |           | not null |         | (column_name 'id')
 param    | text   |           | not null |         | (column_name 'param')
 value    | text   |           |          |         | (column_name 'value')
 hostname | text   |           | not null |         | (column_name
'hostname')
 filetype | text   |           | not null |         | (column_name
'filetype')
Server: qdc
FDW options: (schema_name 'importer', table_name 'config')

Backtrace on the local server (PID 25301)

#0  0x00007fe8a32cd463 in __epoll_wait_nocancel () from /lib64/libc.so.6
(gdb) bt
#0  0x00007fe8a32cd463 in __epoll_wait_nocancel () from /lib64/libc.so.6
#1  0x000000000070c81e in WaitEventSetWaitBlock (nevents=1,
occurred_events=0x7ffd7bfc0fc0, cur_timeout=-1, set=0x24f1900) at
latch.c:1048
#2  WaitEventSetWait (set=set(at)entry=0x24f1900, timeout=timeout(at)entry=-1,
occurred_events=occurred_events(at)entry=0x7ffd7bfc0fc0,
nevents=nevents(at)entry=1,
    wait_event_info=wait_event_info(at)entry=117440512) at latch.c:1000
#3  0x000000000070cc57 in WaitLatchOrSocket (latch=0x7fe8967e9fc4,
wakeEvents=wakeEvents(at)entry=3, sock=26, timeout=timeout(at)entry=-1,
    wait_event_info=wait_event_info(at)entry=117440512) at latch.c:385
#4  0x00007fe89a9dbc03 in pgfdw_get_result (conn=0x2880170,
query=0x2348c90 "DELETE FROM importer.config") at connection.c:548
#5  0x00007fe89a9d5a9e in execute_dml_stmt (node=0x24d4710,
node=0x24d4710) at postgres_fdw.c:3787
#6  postgresIterateDirectModify (node=0x24d4710) at postgres_fdw.c:2553
#7  0x000000000061fabc in ForeignNext (node=node(at)entry=0x24d4710) at
nodeForeignscan.c:52
#8  0x00000000005fe2e5 in ExecScanFetch (recheckMtd=0x61f9f0
<ForeignRecheck>, accessMtd=0x61fa80 <ForeignNext>, node=0x24d4710) at
execScan.c:95
#9  ExecScan (node=0x24d4710, accessMtd=0x61fa80 <ForeignNext>,
recheckMtd=0x61f9f0 <ForeignRecheck>) at execScan.c:162
#10 0x00000000006186bb in ExecProcNode (node=0x24d4710) at
../../../src/include/executor/executor.h:247
#11 ExecModifyTable (pstate=0x24d3db0) at nodeModifyTable.c:2025
#12 0x00000000005f638a in ExecProcNode (node=0x24d3db0) at
../../../src/include/executor/executor.h:247
#13 ExecutePlan (execute_once=<optimized out>, dest=0xccd060
<spi_printtupDR>, direction=<optimized out>, numberTuples=0,
sendTuples=false, operation=CMD_DELETE,
    use_parallel_mode=<optimized out>, planstate=0x24d3db0,
estate=0x24d3850) at execMain.c:1723
#14 standard_ExecutorRun (queryDesc=0x2aa39d0, direction=<optimized
out>, count=0, execute_once=<optimized out>) at execMain.c:364
#15 0x00007fe89b23ff85 in pgss_ExecutorRun (queryDesc=0x2aa39d0,
direction=ForwardScanDirection, count=0, execute_once=<optimized out>)
at pg_stat_statements.c:892
#16 0x0000000000626969 in _SPI_pquery (tcount=0, fire_triggers=true,
queryDesc=0x2aa39d0) at spi.c:2501
#17 _SPI_execute_plan (plan=plan(at)entry=0x1e73820,
paramLI=paramLI(at)entry=0x0, snapshot=snapshot(at)entry=0x0,
crosscheck_snapshot=crosscheck_snapshot(at)entry=0x0,
    read_only=read_only(at)entry=false,
fire_triggers=fire_triggers(at)entry=true, tcount=tcount(at)entry=0) at spi.c:2263
#18 0x0000000000626d68 in SPI_execute_plan_with_paramlist
(plan=0x1e73820, params=0x0, read_only=<optimized out>, tcount=0) at
spi.c:540
#19 0x00007fe80c6b16ef in exec_stmt_execsql
(estate=estate(at)entry=0x7ffd7bfc18e0, stmt=stmt(at)entry=0x1e67470) at
pl_exec.c:4115
#20 0x00007fe80c6b2deb in exec_stmt (stmt=0x1e67470,
estate=0x7ffd7bfc18e0) at pl_exec.c:1986
#21 exec_stmts (estate=0x7ffd7bfc18e0, stmts=<optimized out>) at
pl_exec.c:1877
#22 0x00007fe80c6b4f42 in exec_stmt_block
(estate=estate(at)entry=0x7ffd7bfc18e0, block=0x1e70e20) at pl_exec.c:1818
#23 0x00007fe80c6b5690 in plpgsql_exec_trigger
(func=func(at)entry=0x1cb2b98, trigdata=0x7ffd7bfc20f0) at pl_exec.c:958
#24 0x00007fe80c6a9382 in plpgsql_call_handler (fcinfo=0x7ffd7bfc1c70)
at pl_handler.c:254
#25 0x00000000005d28c8 in ExecCallTriggerFunc
(trigdata=trigdata(at)entry=0x7ffd7bfc20f0, tgindx=tgindx(at)entry=1,
finfo=0x2bac7c0, finfo(at)entry=0x2bac790,
    instr=instr(at)entry=0x0,
per_tuple_context=per_tuple_context(at)entry=0x27e73a0) at trigger.c:2412
#26 0x00000000005d3f1d in AfterTriggerExecute (trigdesc=0x2bac5d0,
trigdesc=0x2bac5d0, trig_tuple_slot2=0x0, trig_tuple_slot1=0x0,
per_tuple_context=0x27e73a0, instr=0x0,
    finfo=0x2bac790, rel=0x1d1f5f8, event=0x2a6127c) at trigger.c:4359
#27 afterTriggerInvokeEvents (events=events(at)entry=0x2c28790,
firing_id=1, estate=estate(at)entry=0x2bac280,
delete_ok=delete_ok(at)entry=false) at trigger.c:4550
#28 0x00000000005d9397 in AfterTriggerEndQuery
(estate=estate(at)entry=0x2bac280) at trigger.c:4860
#29 0x00000000005f6609 in standard_ExecutorFinish (queryDesc=0x2c2f6a0)
at execMain.c:439
#30 0x00007fe89b23fe95 in pgss_ExecutorFinish (queryDesc=0x2c2f6a0) at
pg_stat_statements.c:915
#31 0x000000000072e6e8 in ProcessQuery (plan=<optimized out>,
    sourceText=0x2c2f610 "UPDATE importer.config SET value='f' WHERE
param='active' AND hostname = 'dc-int' AND filetype='ssa5'", params=0x0,
queryEnv=0x0,
    dest=0xccd120 <donothingDR>, completionTag=0x7ffd7bfc2600 "UPDATE
1") at pquery.c:205
#32 0x000000000072e8c2 in PortalRunMulti (portal=portal(at)entry=0x1d0f6a0,
isTopLevel=isTopLevel(at)entry=true,
setHoldSnapshot=setHoldSnapshot(at)entry=false,
    dest=0xccd120 <donothingDR>, dest(at)entry=0x1c88a70, altdest=0xccd120
<donothingDR>, altdest(at)entry=0x1c88a70,
    completionTag=completionTag(at)entry=0x7ffd7bfc2600 "UPDATE 1") at
pquery.c:1286
#33 0x000000000072f38d in PortalRun (portal=portal(at)entry=0x1d0f6a0,
count=count(at)entry=1, isTopLevel=isTopLevel(at)entry=true,
run_once=<optimized out>,
    dest=dest(at)entry=0x1c88a70, altdest=altdest(at)entry=0x1c88a70,
completionTag=completionTag(at)entry=0x7ffd7bfc2600 "UPDATE 1") at pquery.c:799
#34 0x000000000072ce8d in exec_execute_message (max_rows=1,
portal_name=0x1c88660 "") at postgres.c:2036
#35 PostgresMain (argc=<optimized out>, argv=argv(at)entry=0x1cb1358,
dbname=0x1cb1148 "voip", username=<optimized out>) at postgres.c:4245
#36 0x000000000047e313 in BackendRun (port=0x1caedd0) at postmaster.c:4358
#37 BackendStartup (port=0x1caedd0) at postmaster.c:4030
#38 ServerLoop () at postmaster.c:1707

*** Remote server ***

postgres=# select
pid,wait_event_type,wait_event,datid,datname,backend_start,xact_start,query_start,state_change,backend_xid,backend_xmin,backend_type,substring(query,1,50)
from pg_stat_activity where state <> 'idle';
  pid  | wait_event_type |  wait_event   | datid |    datname |        
backend_start         |          xact_start |         
query_start          |         s
tate_change          | backend_xid | backend_xmin |  backend_type
|                     substring
-------+-----------------+---------------+-------+---------------+-------------------------------+-------------------------------+-------------------------------+----------
---------------------+-------------+--------------+----------------+----------------------------------------------------
 25536 | Lock            | transactionid | 16401 | datacollector |
2019-06-11 12:24:46.567576+02 | 2019-06-11 12:24:46.607314+02 |
2019-06-11 12:24:46.615009+02 | 2019-06-11 12:24:46.61501+02 |    
4292464 |      4292463 | client backend | DELETE FROM importer.config
 10183 | Client          | ClientRead    | 16401 | datacollector |
2019-06-11 11:58:09.340121+02 | 2019-06-11 12:24:45.870476+02 |
2019-06-11 12:24:46.711237+02 | 2019-06-11 12:24:46.711292+02 |    
4292463 |      4292463 | client backend | DEALLOCATE pgsql_fdw_prep_178
(3 rows)

postgres=# select * from pg_locks where pid=25536 and granted=false;
   locktype    | database | relation | page | tuple | virtualxid |
transactionid | classid | objid | objsubid | virtualtransaction | pid 
|   mode    | granted | fastpath
---------------+----------+----------+------+-------+------------+---------------+---------+-------+----------+--------------------+-------+-----------+---------+----------
 transactionid |          |          |      |       | |       4292463
|         |       |          | 52/155384 | 25536 | ShareLock | f       | f
(1 row)

postgres=# select * from pg_locks where transactionid=4292463;
   locktype    | database | relation | page | tuple | virtualxid |
transactionid | classid | objid | objsubid | virtualtransaction | pid 
|     mode      | granted | fastp
ath
---------------+----------+----------+------+-------+------------+---------------+---------+-------+----------+--------------------+-------+---------------+---------+------
----
 transactionid |          |          |      |       | |       4292463
|         |       |          | 52/155384 | 25536 | ShareLock     |
f       | f
 transactionid |          |          |      |       | |       4292463
|         |       |          | 56/28816 | 10183 | ExclusiveLock |
t       | f
(2 rows)

postgres=# select * from pg_locks where pid=10183;
   locktype    | database | relation | page | tuple | virtualxid |
transactionid | classid | objid | objsubid | virtualtransaction | pid 
|       mode       | granted | fa
stpath
---------------+----------+----------+------+-------+------------+---------------+---------+-------+----------+--------------------+-------+------------------+---------+---
-------
 relation      |    16401 |    16874 |      |       | |              
|         |       |          | 56/28816 | 10183 | RowExclusiveLock |
t       | t
 virtualxid    |          |          |      |       | 56/28816
|               |         |       |          | 56/28816 | 10183 |
ExclusiveLock    | t       | t
 transactionid |          |          |      |       | |       4292463
|         |       |          | 56/28816 | 10183 | ExclusiveLock    |
t       | f
 relation      |    16401 |    16867 |      |       | |              
|         |       |          | 56/28816 | 10183 | RowExclusiveLock |
t       | f
(4 rows)

(gdb) bt
#0  0x00007f71bb99d463 in __epoll_wait_nocancel () from /lib64/libc.so.6
#1  0x000000000070c81e in WaitEventSetWaitBlock (nevents=1,
occurred_events=0x7ffdde17a910, cur_timeout=-1, set=0x22b50e8) at
latch.c:1048
#2  WaitEventSetWait (set=0x22b50e8, timeout=timeout(at)entry=-1,
occurred_events=occurred_events(at)entry=0x7ffdde17a910,
nevents=nevents(at)entry=1,
    wait_event_info=wait_event_info(at)entry=100663296) at latch.c:1000
#3  0x000000000062d4d3 in secure_read (port=0x22b2ed0, ptr=0xcda640
<PqRecvBuffer>, len=8192) at be-secure.c:173
#4  0x00000000006364b8 in pq_recvbuf () at pqcomm.c:963
#5  0x0000000000637085 in pq_getbyte () at pqcomm.c:1006
#6  0x000000000072c4e2 in SocketBackend (inBuf=0x7ffdde17aa80) at
postgres.c:341
#7  ReadCommand (inBuf=0x7ffdde17aa80) at postgres.c:514
#8  PostgresMain (argc=<optimized out>, argv=argv(at)entry=0x22b5348,
dbname=0x22b5218 "datacollector", username=<optimized out>) at
postgres.c:4124
#9  0x000000000047e313 in BackendRun (port=0x22b2ed0) at postmaster.c:4358
#10 BackendStartup (port=0x22b2ed0) at postmaster.c:4030
#11 ServerLoop () at postmaster.c:1707
#12 0x00000000006c3719 in PostmasterMain (argc=argc(at)entry=5,
argv=argv(at)entry=0x22842d0) at postmaster.c:1380
#13 0x000000000047ed8b in main (argc=5, argv=0x22842d0) at main.c:228

Any advice on how to debug this further ?

Regards,
Tobias

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Masahiko Sawada 2019-06-11 12:08:36 Re: Temp table handling after anti-wraparound shutdown (Was: BUG #15840)
Previous Message YunQiang Su 2019-06-11 05:12:54 Re: BUG #15844: MIPS: remove .set mips2 in s_lock.h to fix r6 build