Server crash due to assertion failure in CheckOpSlotCompatibility()

From: Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com>
To: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Cc: Andres Freund <andres(at)anarazel(dot)de>
Subject: Server crash due to assertion failure in CheckOpSlotCompatibility()
Date: 2019-05-29 12:20:35
Message-ID: CAE9k0PmNaMD2oHTEAhRyxnxpaDaYkuBYkLa1dpOpn=RS0iS2AQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi All,

I'm getting a server crash when executing the following test-case:

create table t1(a int primary key, b text);
insert into t1 values (1, 'aa'), (2, 'bb'), (3, 'aa'), (4, 'bb');
select a, b, array_agg(a order by a) from t1 group by grouping sets ((a),
(b));

*Backtrace:*
#0 0x00007f37d0630277 in raise () from /lib64/libc.so.6
#1 0x00007f37d0631968 in abort () from /lib64/libc.so.6
#2 0x0000000000a5685e in ExceptionalCondition (conditionName=0xc29fd0
"!(op->d.fetch.kind == slot->tts_ops)", errorType=0xc29cc1
"FailedAssertion",
fileName=0xc29d09 "execExprInterp.c", lineNumber=1905) at assert.c:54
#3 0x00000000006dfa2b in CheckOpSlotCompatibility (op=0x2e84e38,
slot=0x2e6e268) at execExprInterp.c:1905
#4 0x00000000006dd446 in ExecInterpExpr (state=0x2e84da0,
econtext=0x2e6d8e8, isnull=0x7ffe53cba4af) at execExprInterp.c:439
#5 0x00000000007010e5 in ExecEvalExprSwitchContext (state=0x2e84da0,
econtext=0x2e6d8e8, isNull=0x7ffe53cba4af)
at ../../../src/include/executor/executor.h:307
#6 0x0000000000701be7 in advance_aggregates (aggstate=0x2e6d6b0) at
nodeAgg.c:679
#7 0x0000000000703a5d in agg_retrieve_direct (aggstate=0x2e6d6b0) at
nodeAgg.c:1847
#8 0x00000000007034da in ExecAgg (pstate=0x2e6d6b0) at nodeAgg.c:1572
#9 0x00000000006e797f in ExecProcNode (node=0x2e6d6b0) at
../../../src/include/executor/executor.h:239
#10 0x00000000006ea174 in ExecutePlan (estate=0x2e6d458,
planstate=0x2e6d6b0, use_parallel_mode=false, operation=CMD_SELECT,
sendTuples=true,
numberTuples=0, direction=ForwardScanDirection, dest=0x2e76b30,
execute_once=true) at execMain.c:1648
#11 0x00000000006e7f91 in standard_ExecutorRun (queryDesc=0x2e7b3b8,
direction=ForwardScanDirection, count=0, execute_once=true) at
execMain.c:365
#12 0x00000000006e7dc7 in ExecutorRun (queryDesc=0x2e7b3b8,
direction=ForwardScanDirection, count=0, execute_once=true) at
execMain.c:309
#13 0x00000000008e40c7 in PortalRunSelect (portal=0x2e10bc8, forward=true,
count=0, dest=0x2e76b30) at pquery.c:929
#14 0x00000000008e3d66 in PortalRun (portal=0x2e10bc8,
count=9223372036854775807, isTopLevel=true, run_once=true, dest=0x2e76b30,
altdest=0x2e76b30,
completionTag=0x7ffe53cba850 "") at pquery.c:770

The following Assert statement in *CheckOpSlotCompatibility*() fails.

1905 Assert(op->d.fetch.kind == slot->tts_ops);

And above assert statement was added by you as a part of the following git
commit.

commit 15d8f83128e15de97de61430d0b9569f5ebecc26
Author: Andres Freund <andres(at)anarazel(dot)de>
Date: Thu Nov 15 22:00:30 2018 -0800

Verify that expected slot types match returned slot types.

This is important so JIT compilation knows what kind of tuple slot the
deforming routine can expect. There's also optimization potential for
expression initialization without JIT compilation. It e.g. seems
plausible to elide EEOP_*_FETCHSOME ops entirely when dealing with
virtual slots.

Author: Andres Freund

*Analysis:*
I did some quick investigation on this and found that when the aggregate is
performed on the first group i.e. group by 'a', all the input tuples are
fetched from the outer plan and stored into the tuplesort object and for
the subsequent groups i.e. from the second group onwards, the tuples stored
in tuplessort object during 1st phase is used. But, then, the tuples stored
in the tuplesort object are actually the minimal tuples whereas it is
expected to be a heap tuple which actually results into the assertion
failure.

I might be wrong, but it seems to me like the slot fetched from tuplesort
object needs to be converted to the heap tuple. Actually the following
lines of code in agg_retrieve_direct() gets executed only when we have
crossed a group boundary. I think, at least the function call to
ExecCopySlotHeapTuple(outerslot); followed by ExecForceStoreHeapTuple();
should always happen irrespective of the group boundary limit is crossed or
not... Sorry if I'm saying something ...

1871 * If we are grouping,
check whether we've crossed a group

│1872 * boundary.


│1873 */


│1874 if (node->aggstrategy
!= AGG_PLAIN)

│1875 {


│1876
tmpcontext->ecxt_innertuple = firstSlot;

│1877 if
(!ExecQual(aggstate->phase->eqfunctions[node->numCols - 1],

│1878
tmpcontext))

│1879 {


│1880
aggstate->grp_firstTuple = ExecCopySlotHeapTuple(outerslot);

│1881 break;


│1882 }


│1883 }

--
With Regards,
Ashutosh Sharma
EnterpriseDB:*http://www.enterprisedb.com <http://www.enterprisedb.com/>*

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Gilles Darold 2019-05-29 12:26:50 Doc fix on information_schema.views
Previous Message Haribabu Kommi 2019-05-29 08:44:42 Re: How to know referenced sub-fields of a composite type?