Re: pg13.2: invalid memory alloc request size NNNN

From: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
To: Justin Pryzby <pryzby(at)telsasoft(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: pg13.2: invalid memory alloc request size NNNN
Date: 2021-02-12 17:44:54
Message-ID: c27089c6-3a23-1769-6ec9-9012fef5d3b1@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2/12/21 2:48 AM, Justin Pryzby wrote:
> ts=# \errverbose
> ERROR: XX000: invalid memory alloc request size 18446744073709551613
>
> #0 pg_re_throw () at elog.c:1716
> #1 0x0000000000a33b12 in errfinish (filename=0xbff20e "mcxt.c", lineno=959, funcname=0xbff2db <__func__.6684> "palloc") at elog.c:502
> #2 0x0000000000a6760d in palloc (size=18446744073709551613) at mcxt.c:959
> #3 0x00000000009fb149 in text_to_cstring (t=0x2aaae8023010) at varlena.c:212
> #4 0x00000000009fbf05 in textout (fcinfo=0x2094538) at varlena.c:557
> #5 0x00000000006bdd50 in ExecInterpExpr (state=0x2093990, econtext=0x20933d8, isnull=0x7fff5bf04a87) at execExprInterp.c:1112
> #6 0x00000000006d4f18 in ExecEvalExprSwitchContext (state=0x2093990, econtext=0x20933d8, isNull=0x7fff5bf04a87) at ../../../src/include/executor/executor.h:316
> #7 0x00000000006d4f81 in ExecProject (projInfo=0x2093988) at ../../../src/include/executor/executor.h:350
> #8 0x00000000006d5371 in ExecScan (node=0x20932c8, accessMtd=0x7082e0 <SeqNext>, recheckMtd=0x708385 <SeqRecheck>) at execScan.c:238
> #9 0x00000000007083c2 in ExecSeqScan (pstate=0x20932c8) at nodeSeqscan.c:112
> #10 0x00000000006d1b00 in ExecProcNodeInstr (node=0x20932c8) at execProcnode.c:466
> #11 0x00000000006e742c in ExecProcNode (node=0x20932c8) at ../../../src/include/executor/executor.h:248
> #12 0x00000000006e77de in ExecAppend (pstate=0x2089208) at nodeAppend.c:267
> #13 0x00000000006d1b00 in ExecProcNodeInstr (node=0x2089208) at execProcnode.c:466
> #14 0x000000000070964f in ExecProcNode (node=0x2089208) at ../../../src/include/executor/executor.h:248
> #15 0x0000000000709795 in ExecSort (pstate=0x2088ff8) at nodeSort.c:108
> #16 0x00000000006d1b00 in ExecProcNodeInstr (node=0x2088ff8) at execProcnode.c:466
> #17 0x00000000006d1ad1 in ExecProcNodeFirst (node=0x2088ff8) at execProcnode.c:450
> #18 0x00000000006dec36 in ExecProcNode (node=0x2088ff8) at ../../../src/include/executor/executor.h:248
> #19 0x00000000006df079 in fetch_input_tuple (aggstate=0x2088a20) at nodeAgg.c:589
> #20 0x00000000006e1fad in agg_retrieve_direct (aggstate=0x2088a20) at nodeAgg.c:2368
> #21 0x00000000006e1bfd in ExecAgg (pstate=0x2088a20) at nodeAgg.c:2183
> #22 0x00000000006d1b00 in ExecProcNodeInstr (node=0x2088a20) at execProcnode.c:466
> #23 0x00000000006d1ad1 in ExecProcNodeFirst (node=0x2088a20) at execProcnode.c:450
> #24 0x00000000006c6ffa in ExecProcNode (node=0x2088a20) at ../../../src/include/executor/executor.h:248
> #25 0x00000000006c966b in ExecutePlan (estate=0x2032f48, planstate=0x2088a20, use_parallel_mode=false, operation=CMD_SELECT, sendTuples=true, numberTuples=0, direction=ForwardScanDirection, dest=0xbb3400 <donothingDR>,
> execute_once=true) at execMain.c:1632
>
> #3 0x00000000009fb149 in text_to_cstring (t=0x2aaae8023010) at varlena.c:212
> 212 result = (char *) palloc(len + 1);
>
> (gdb) l
> 207 /* must cast away the const, unfortunately */
> 208 text *tunpacked = pg_detoast_datum_packed(unconstify(text *, t));
> 209 int len = VARSIZE_ANY_EXHDR(tunpacked);
> 210 char *result;
> 211
> 212 result = (char *) palloc(len + 1);
>
> (gdb) p len
> $1 = -4
>
> This VM had some issue early today and I killed the VM, causing PG to execute
> recovery. I'm tentatively blaming that on zfs, so this could conceivably be a
> data error (although recovery supposedly would have resolved it). I just
> checked and data_checksums=off.
>

This seems very much like a corrupted varlena header - length (-4) is
clearly bogus, and it's what triggers the problem, because that's what
wraps around to 18446744073709551613 (which is 0xFFFFFFFFFFFFFFFD).

This has to be a value stored in a table, not some intermediate value
created during execution. So I don't think the exact query matters. Can
you try doing something like pg_dump, which has to detoast everything?

The question is whether this is due to the VM getting killed in some
strange way (what VM system is this, how is the storage mounted?) or
whether the recovery is borked and failed to do the right thing.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Matthias van de Meent 2021-02-12 17:47:06 Re: Improvements and additions to COPY progress reporting
Previous Message Isaac Morland 2021-02-12 17:35:55 Trigger execution role