Re: gs_group_1 crashing on 13beta2/s390x

From: Ranier Vilela <ranier(dot)vf(at)gmail(dot)com>
To: Christoph Berg <myon(at)debian(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: gs_group_1 crashing on 13beta2/s390x
Date: 2020-09-25 18:05:16
Message-ID: CAEudQArWoTTCXy0q7E-QtevPA0cW0cebZOOunyEJ2GbzoVDf0A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Em sex., 25 de set. de 2020 às 14:36, Ranier Vilela <ranier(dot)vf(at)gmail(dot)com>
escreveu:

> Em sex., 25 de set. de 2020 às 11:30, Christoph Berg <myon(at)debian(dot)org>
> escreveu:
>
>> Re: Tom Lane
>> > > Tom> It's hardly surprising that datumCopy would segfault when given
>> a
>> > > Tom> null "value" and told it is pass-by-reference. However, to get
>> to
>> > > Tom> the datumCopy call, we must have passed the
>> MemoryContextContains
>> > > Tom> check on that very same pointer value, and that would surely
>> have
>> > > Tom> segfaulted as well, one would think.
>> >
>> > > Nope, because MemoryContextContains just returns "false" if passed a
>> > > NULL pointer.
>> >
>> > Ah, right. So you could imagine getting here if the finalfn had
>> returned
>> > PointerGetDatum(NULL) with isnull = false. We have some aggregate
>> > transfns that are capable of doing that for internal-type transvalues,
>> > I think, but the finalfn never should do it.
>>
>> So I had another stab at this. As expected, the 13.0 upload to
>> Debian/unstable crashed again on the buildd, while a manual
>> everything-should-be-the-same build succeeded. I don't know why I
>> didn't try this before, but this time I took this manual build and
>> started a PG instance from it. Pasting the gs_group_1 queries made it
>> segfault instantly.
>>
>> So here we are:
>>
>> #0 datumCopy (value=0, typLen=-1, typByVal=false) at
>> ./build/../src/backend/utils/adt/datum.c:142
>> #1 0x000002aa3bf6322e in datumCopy (value=<optimized out>,
>> typByVal=<optimized out>, typLen=<optimized out>)
>> at ./build/../src/backend/utils/adt/datum.c:178
>> #2 0x000002aa3bda6dd6 in finalize_aggregate (aggstate=aggstate(at)entry=0x2aa3defbfd0,
>> peragg=peragg(at)entry=0x2aa3e0671f0,
>> pergroupstate=pergroupstate(at)entry=0x2aa3e026b78,
>> resultVal=resultVal(at)entry=0x2aa3e067108, resultIsNull=0x2aa3e06712a)
>> at ./build/../src/backend/executor/nodeAgg.c:1153
>>
>> (gdb) p *resultVal
>> $3 = 0
>> (gdb) p *resultIsNull
>> $6 = false
>>
>> (gdb) p *peragg
>> $7 = {aggref = 0x2aa3deef218, transno = 2, finalfn_oid = 0, finalfn =
>> {fn_addr = 0x0, fn_oid = 0, fn_nargs = 0, fn_strict = false,
>> fn_retset = false, fn_stats = 0 '\000', fn_extra = 0x0, fn_mcxt =
>> 0x0, fn_expr = 0x0}, numFinalArgs = 1, aggdirectargs = 0x0,
>> resulttypeLen = -1, resulttypeByVal = false, shareable = true}
>>
>> Since finalfn_oid is 0, resultVal/resultIsNull were set by the `else`
>> branch of the if (OidIsValid) in finalize_aggregate():
>>
>> else
>> {
>> /* Don't need MakeExpandedObjectReadOnly; datumCopy will copy it
>> */
>> *resultVal = pergroupstate->transValue;
>> *resultIsNull = pergroupstate->transValueIsNull;
>> }
>>
>> (gdb) p *pergroupstate
>> $12 = {transValue = 0, transValueIsNull = false, noTransValue = false}
>>
> Here transValueIsNull shouldn't be "true"?
> thus, DatumCopy would be protected, for this test: "!*resultIsNull"
>
Observe this excerpt (line 1129):
/* don't call a strict function with NULL inputs */
*resultVal = (Datum) 0;
*resultIsNull = true;

Now, it does not contradict this principle.
If all the values are null, they should be filled with True (1),
and not 0 (false)?

Line (4711), function ExecReScanAgg:
MemSet(econtext->ecxt_aggvalues, 0, sizeof(Datum) * node->numaggs);
MemSet(econtext->ecxt_aggnulls, true, sizeof(bool) * node->numaggs);

zero, here, mean False, aggvalues is Null? Not.

regards,
Ranier Vilela

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2020-09-25 18:11:46 Re: gs_group_1 crashing on 13beta2/s390x
Previous Message Grigory Smolkin 2020-09-25 17:58:07 Re: history file on replica and double switchover