Re: aggregate crash

From: Andres Freund <andres(at)anarazel(dot)de>
To: Teodor Sigaev <teodor(at)sigaev(dot)ru>
Cc: Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: aggregate crash
Date: 2020-01-03 19:35:39
Message-ID: 20200103193539.mzimh6ufogyppsrl@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2019-12-27 20:13:26 +0300, Teodor Sigaev wrote:
> Found crash on production instance, assert-enabled build crashes in pfree()
> call, with default config. v11, v12 and head are affected, but, seems, you
> need to be a bit lucky.
>
> The bug is comparing old and new aggregate pass-by-ref values only by
> pointer value itself, despite on null flag. Any function which returns null
> doesn't worry about actual returned Datum value, so that comparison isn't
> enough. Test case shows bug with ExecInterpExpr() but there several similar
> places (thanks Nikita Glukhov for help).
> Attached patch adds check of null flag.

Hm. I don't understand the problem here. Why do we need to reparent in
that case? What freed the relevant value?

Nor do I really understand why v10 wouldn't be affected if this actually
is a problem. The relevant code is also only guarded by
DatumGetPointer(newVal) != DatumGetPointer(pergroupstate->transValue))

>
> Backtrace from v12 (note, newValue and oldValue are differ on current call,
> but oldValue points into pfreed memory) :
> #0 0x0000000000c8405a in GetMemoryChunkContext (pointer=0x80a808250) at
> ../../../../src/include/utils/memutils.h:130
> 130 AssertArg(MemoryContextIsValid(context));
> (gdb) bt
> #0 0x0000000000c8405a in GetMemoryChunkContext (pointer=0x80a808250) at
> ../../../../src/include/utils/memutils.h:130
> #1 0x0000000000c85ae5 in pfree (pointer=0x80a808250) at mcxt.c:1058
> #2 0x000000000080475e in ExecAggTransReparent (aggstate=0x80a806370,
> pertrans=0x80a87e830, newValue=34535940744, newValueIsNull=false,
> oldValue=34535932496, oldValueIsNull=false)
> at execExprInterp.c:4209
> #3 0x00000000007ff51f in ExecInterpExpr (state=0x80a87f4d8,
> econtext=0x80a8065a8, isnull=0x7fffffffd7b7) at execExprInterp.c:1747
> #4 0x000000000082c12b in ExecEvalExprSwitchContext (state=0x80a87f4d8,
> econtext=0x80a8065a8, isNull=0x7fffffffd7b7) at
> ../../../src/include/executor/executor.h:308
> #5 0x000000000082bc0f in advance_aggregates (aggstate=0x80a806370) at nodeAgg.c:679
> #6 0x000000000082b8a6 in agg_retrieve_direct (aggstate=0x80a806370) at
> nodeAgg.c:1847
> #7 0x0000000000828782 in ExecAgg (pstate=0x80a806370) at nodeAgg.c:1572
> #8 0x000000000080e712 in ExecProcNode (node=0x80a806370) at
> ../../../src/include/executor/executor.h:240

> How to reproduce:
> http://sigaev.ru/misc/xdump.sql.bz2
> bzcat xdump.sql.bz2 | psql postgres && psql postgres < x.sql

It should be possible to create a smaller reproducer... It'd be good if
a bug fix for this were committed with a regression test.

> diff --git a/src/backend/executor/execExprInterp.c b/src/backend/executor/execExprInterp.c
> index 034970648f3..3b5333716d4 100644
> --- a/src/backend/executor/execExprInterp.c
> +++ b/src/backend/executor/execExprInterp.c
> @@ -1743,7 +1743,8 @@ ExecInterpExpr(ExprState *state, ExprContext *econtext, bool *isnull)
> * expanded object that is already a child of the aggcontext,
> * assume we can adopt that value without copying it.
> */
> - if (DatumGetPointer(newVal) != DatumGetPointer(pergroup->transValue))
> + if (DatumGetPointer(newVal) != DatumGetPointer(pergroup->transValue) ||
> + fcinfo->isnull != pergroup->transValueIsNull)
> newVal = ExecAggTransReparent(aggstate, pertrans,
> newVal, fcinfo->isnull,
> pergroup->transValue,

I'd really like to avoid adding additional branches to these paths.

Greetings,

Andres Freund

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Stephen Frost 2020-01-03 19:35:59 Re: backup manifests
Previous Message Robert Haas 2020-01-03 19:32:01 Re: Greatest Common Divisor