Consistent segfault in complex query

From: Kyle Samson <kysamson(at)tripadvisor(dot)com>
To: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Cc: Matthew Kelly <mkelly(at)tripadvisor(dot)com>
Subject: Consistent segfault in complex query
Date: 2018-09-12 14:17:25
Message-ID: A033A40A-B234-4324-BE37-272279F7B627@tripadvisor.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello,

We encountered a query that has been able to frequently segfault one of our postgres instances under certain conditions which we have not fully been able to isolate for reproduction. We were able to get a core dump out of one of the crashes and have poked at it, but we believe the answer is beyond our knowledge of postgres internals. This is on a 9.3.19 server and we saw no mention of a fix in the release notes since this version and we do not know if it affects later major releases as well.

What we know so far is that somehow the arraycontains function was given a datum of 0 as the second argument that dereferenced to a null pointer. Our current hypothesis from poking at the core dump is that some memory context is getting freed before it should. This assumption comes from the complexity in the query (CTE containing params being repeatedly evaluated by multiple case statements) and the unpredictability of the failure case.

The issue is easily avoidable, and we have asked the developer to solve their problem differently. However, the existence of a segfault is always concerning and we are reporting this issue in an effort to be conscientiousness members of the community.

Due to the potentially sensitive contents we cannot provide the core directly, but we are happy to run commands against the core file to extract debugging information. We have also replaced certain values (database name, table name, column name) with generic identifiers.

version
----------------------------------------------------------------------------------------------------------------
PostgreSQL 9.3.19 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-18), 64-bit
(1 row)

Query:

WITH tmp(foo2, bar2, baz2, minutes2) AS
(
SELECT ARRAY[$1 ::text], ARRAY[$2 ::text], ARRAY[$3 ::text], $4 ::double precision
)

UPDATE table_name
SET occurrences = CASE
WHEN time_last_noticed >= current_timestamp - (SELECT minutes2 FROM tmp) * interval '1 minutes'
THEN occurrences + 1
ELSE 1
END,
foo = CASE
WHEN time_last_noticed < current_timestamp - (SELECT minutes2 FROM tmp) * interval '1 minutes'
THEN (SELECT foo2 FROM tmp)
WHEN foo @> (SELECT foo2 FROM tmp)
THEN foo
ELSE array_cat(foo, (SELECT foo2 FROM tmp))
END,
bar = CASE
WHEN time_last_noticed < current_timestamp - (SELECT minutes2 FROM tmp) * interval '1 minutes'
THEN (SELECT bar2 FROM tmp)
WHEN bar @> (SELECT bar2 FROM tmp)
THEN bar
ELSE array_cat(bar, (SELECT bar2 FROM tmp))
END,
baz = CASE
WHEN time_last_noticed < current_timestamp - (SELECT minutes2 FROM tmp) * interval '1 minutes'
THEN (SELECT baz2 FROM tmp)
WHEN baz @> (SELECT baz2 FROM tmp)
THEN baz
ELSE array_cat(baz, (SELECT baz2 FROM tmp))
END,
time_last_noticed = current_timestamp,
total_occurrences = total_occurrences + 1
WHERE id1 = $5 AND id2 = $6;

Table schema:

Table "public.table_name"
Column | Type | Modifiers | Storage | Stats target | Description
-------------------+-----------------------------+-----------+----------+--------------+-------------
id1 | text | not null | extended | |
id2 | text | not null | extended | |
occurrences | integer | not null | plain | |
time_last_noticed | timestamp without time zone | not null | plain | |
total_occurrences | integer | not null | plain | |
bar | text[] | | extended | |
baz | text[] | | extended | |
foo | text[] | | extended | |
Indexes:
"table_name_pkey" PRIMARY KEY, btree (id1, id2)

Back trace from the dump:

(gdb) bt
#0 pg_detoast_datum (datum=0x0) at fmgr.c:2241
#1 0x000000000067cd90 in arraycontains (fcinfo=0x2c56e30) at arrayfuncs.c:3841
#2 0x000000000058ba25 in ExecMakeFunctionResultNoSets (fcache=0x2c56dc0, econtext=0x2c500c0, isNull=0x7ffc571d9a3f "", isDone=<value optimized out>) at execQual.c:2027
#3 0x0000000000587375 in ExecEvalCase (caseExpr=0x2c54a30, econtext=0x2c500c0, isNull=0x2c60967 "", isDone=0x2c60c4c) at execQual.c:2985
#4 0x00000000005878c3 in ExecTargetList (projInfo=<value optimized out>, isDone=0x7ffc571d9b0c) at execQual.c:5322
#5 ExecProject (projInfo=<value optimized out>, isDone=0x7ffc571d9b0c) at execQual.c:5537
#6 0x000000000058dd62 in ExecScan (node=0x2c4ffb0, accessMtd=0x599340 <IndexNext>, recheckMtd=0x5992f0 <IndexRecheck>) at execScan.c:207
#7 0x0000000000586e28 in ExecProcNode (node=0x2c4ffb0) at execProcnode.c:404
#8 0x0000000000584472 in EvalPlanQualNext (epqstate=<value optimized out>) at execMain.c:2366
#9 0x0000000000584b47 in EvalPlanQual (estate=0x2bdd240, epqstate=0x2c0d638, relation=<value optimized out>, rti=1, lockmode=<value optimized out>, tid=0x7ffc571d9c60, priorXmax=2185728490) at execMain.c:1951
#10 0x000000000059d7cb in ExecUpdate (node=0x2c0d598) at nodeModifyTable.c:727
#11 ExecModifyTable (node=0x2c0d598) at nodeModifyTable.c:997
#12 0x0000000000586e78 in ExecProcNode (node=0x2c0d598) at execProcnode.c:377
#13 0x0000000000585a82 in ExecutePlan (queryDesc=0x2bdd000, direction=<value optimized out>, count=0) at execMain.c:1488
#14 standard_ExecutorRun (queryDesc=0x2bdd000, direction=<value optimized out>, count=0) at execMain.c:318
#15 0x00007f2e5f3b358b in explain_ExecutorRun (queryDesc=0x2bdd000, direction=ForwardScanDirection, count=0) at auto_explain.c:231
#16 0x00007f2e5f1af495 in pgss_ExecutorRun (queryDesc=0x2bdd000, direction=ForwardScanDirection, count=0) at pg_stat_statements.c:716
#17 0x000000000066553f in ProcessQuery (plan=0x2be32b0, sourceText=0x2be7fe0 "WITH tmp(foo2, bar2, baz2, minutes2) AS ( SELECT ARRAY[$1 ::text], ARRAY[$2 ::text], ARRAY[$3 ::text], $4 ::double precision ) UPDATE table_name SET occurrences = CAS"..., params=0x2bdce30, dest=<value optimized out>, completionTag=0x7ffc571da2f0 "") at pquery.c:185
#18 0x000000000066576f in PortalRunMulti (portal=0x2bd2530, isTopLevel=1 '\001', dest=0xb1d380, altdest=0xb1d380, completionTag=0x7ffc571da2f0 "") at pquery.c:1275
#19 0x0000000000665e32 in PortalRun (portal=0x2bd2530, count=9223372036854775807, isTopLevel=1 '\001', dest=0x2ba30b0, altdest=0x2ba30b0, completionTag=0x7ffc571da2f0 "") at pquery.c:812
#20 0x000000000066409d in exec_execute_message (argc=<value optimized out>, argv=<value optimized out>, dbname=0x2965280 "database_name", username=<value optimized out>) at postgres.c:1958
#21 PostgresMain (argc=<value optimized out>, argv=<value optimized out>, dbname=0x2965280 "database_name", username=<value optimized out>) at postgres.c:4154
#22 0x000000000061df38 in BackendRun (argc=<value optimized out>, argv=<value optimized out>) at postmaster.c:4194
#23 BackendStartup (argc=<value optimized out>, argv=<value optimized out>) at postmaster.c:3857
#24 ServerLoop (argc=<value optimized out>, argv=<value optimized out>) at postmaster.c:1702
#25 PostmasterMain (argc=<value optimized out>, argv=<value optimized out>) at postmaster.c:1326
#26 0x00000000005b92f8 in main (argc=5, argv=0x2964470) at main.c:234

fcinfo contents from the arraycontains in frame #1

(gdb) print *fcinfo
$1 = {flinfo = 0x2c56de0, context = 0x0, resultinfo = 0x0, fncollation = 100, isnull = 0 '\000', nargs = 2, arg = {46431788, 0 <repeats 99 times>}, argnull = '\000' <repeats 99 times>}

- Kyle Samson

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2018-09-12 14:50:25 Re: [Patch] Create a new session in postmaster by calling setsid()
Previous Message Fabien COELHO 2018-09-12 14:04:09 Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors