Re: gs_group_1 crashing on 13beta2/s390x

From: Christoph Berg <myon(at)debian(dot)org>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: gs_group_1 crashing on 13beta2/s390x
Date: 2020-09-25 14:30:32
Message-ID: 20200925143032.GG293907@msg.df7cb.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Re: Tom Lane
> > Tom> It's hardly surprising that datumCopy would segfault when given a
> > Tom> null "value" and told it is pass-by-reference. However, to get to
> > Tom> the datumCopy call, we must have passed the MemoryContextContains
> > Tom> check on that very same pointer value, and that would surely have
> > Tom> segfaulted as well, one would think.
>
> > Nope, because MemoryContextContains just returns "false" if passed a
> > NULL pointer.
>
> Ah, right. So you could imagine getting here if the finalfn had returned
> PointerGetDatum(NULL) with isnull = false. We have some aggregate
> transfns that are capable of doing that for internal-type transvalues,
> I think, but the finalfn never should do it.

So I had another stab at this. As expected, the 13.0 upload to
Debian/unstable crashed again on the buildd, while a manual
everything-should-be-the-same build succeeded. I don't know why I
didn't try this before, but this time I took this manual build and
started a PG instance from it. Pasting the gs_group_1 queries made it
segfault instantly.

So here we are:

#0 datumCopy (value=0, typLen=-1, typByVal=false) at ./build/../src/backend/utils/adt/datum.c:142
#1 0x000002aa3bf6322e in datumCopy (value=<optimized out>, typByVal=<optimized out>, typLen=<optimized out>)
at ./build/../src/backend/utils/adt/datum.c:178
#2 0x000002aa3bda6dd6 in finalize_aggregate (aggstate=aggstate(at)entry=0x2aa3defbfd0, peragg=peragg(at)entry=0x2aa3e0671f0,
pergroupstate=pergroupstate(at)entry=0x2aa3e026b78, resultVal=resultVal(at)entry=0x2aa3e067108, resultIsNull=0x2aa3e06712a)
at ./build/../src/backend/executor/nodeAgg.c:1153

(gdb) p *resultVal
$3 = 0
(gdb) p *resultIsNull
$6 = false

(gdb) p *peragg
$7 = {aggref = 0x2aa3deef218, transno = 2, finalfn_oid = 0, finalfn = {fn_addr = 0x0, fn_oid = 0, fn_nargs = 0, fn_strict = false,
fn_retset = false, fn_stats = 0 '\000', fn_extra = 0x0, fn_mcxt = 0x0, fn_expr = 0x0}, numFinalArgs = 1, aggdirectargs = 0x0,
resulttypeLen = -1, resulttypeByVal = false, shareable = true}

Since finalfn_oid is 0, resultVal/resultIsNull were set by the `else`
branch of the if (OidIsValid) in finalize_aggregate():

else
{
/* Don't need MakeExpandedObjectReadOnly; datumCopy will copy it */
*resultVal = pergroupstate->transValue;
*resultIsNull = pergroupstate->transValueIsNull;
}

(gdb) p *pergroupstate
$12 = {transValue = 0, transValueIsNull = false, noTransValue = false}

That comes from finalize_aggregates:

#3 0x000002aa3bda7e10 in finalize_aggregates (aggstate=aggstate(at)entry=0x2aa3defbfd0, peraggs=peraggs(at)entry=0x2aa3e067140,
pergroup=0x2aa3e026b58) at ./build/../src/backend/executor/nodeAgg.c:1369

/*
* Run the final functions.
*/
for (aggno = 0; aggno < aggstate->numaggs; aggno++)
{
AggStatePerAgg peragg = &peraggs[aggno];
int transno = peragg->transno;
AggStatePerGroup pergroupstate;

pergroupstate = &pergroup[transno];

if (DO_AGGSPLIT_SKIPFINAL(aggstate->aggsplit))
finalize_partialaggregate(aggstate, peragg, pergroupstate,
&aggvalues[aggno], &aggnulls[aggno]);
else
finalize_aggregate(aggstate, peragg, pergroupstate,
&aggvalues[aggno], &aggnulls[aggno]);
}

... but at that point I'm lost. Maybe "aggno" and "transno" got mixed
up here?

(I'll leave the gdb session open for further suggestions.)

Christoph

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Christoph Berg 2020-09-25 14:37:35 Re: gs_group_1 crashing on 13beta2/s390x
Previous Message Justin Pryzby 2020-09-25 14:30:00 Re: Probable documentation errors or improvements