Re: Server crash due to assertion failure in CheckOpSlotCompatibility()

From: Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com>
To: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Cc: Andres Freund <andres(at)anarazel(dot)de>
Subject: Re: Server crash due to assertion failure in CheckOpSlotCompatibility()
Date: 2019-05-30 11:01:39
Message-ID: CAE9k0Pn=Hud4OMf_3Hr9o6Vhv5pxJAq0nJj8rFzLXY03RgFUcw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi All,

Here are some more details on the crash reported in my previous e-mail for
better clarity:

The crash only happens when a *primary key* or *btree index* is created on
the test table. For example consider the following two scenarios.

*TC1: With PK*
create table t1(a int *primary key*, b text);
insert into t1 values (1, 'aa'), (2, 'bb'), (3, 'aa'), (4, 'bb');
select a, b, array_agg(a order by a) from t1 group by grouping sets ((a),
(b));

This (TC1) is the problematic case, the explain plan for the query causing
the crash is as follows

postgres=# explain select a, b, array_agg(a order by a) from t1 group by
grouping sets ((a), (b));
QUERY PLAN

-----------------------------------------------------------------------------
GroupAggregate (cost=0.15..166.92 rows=1470 width=68)
Group Key: a
Sort Key: b
Group Key: b
-> Index Scan using t1_pkey on t1 (cost=0.15..67.20 rows=1270 width=36)
(5 rows)

*TC2: Without PK/Btree index*
create table t2(a int, b text);
insert into t2 values (1, 'aa'), (2, 'bb'), (3, 'aa'), (4, 'bb');
select a, b, array_agg(a order by a) from t2 group by grouping sets ((a),
(b));

And here is the explain plan for the query in TC2 that doesn't cause any
crash

postgres=# explain select a, b, array_agg(a order by a) from t2 group by
grouping sets ((a), (b));
QUERY PLAN
-------------------------------------------------------------------
GroupAggregate (cost=88.17..177.69 rows=400 width=68)
Group Key: a
Sort Key: b
Group Key: b
-> Sort (cost=88.17..91.35 rows=1270 width=36)
*Sort Key: a*
-> Seq Scan on t2 (cost=0.00..22.70 rows=1270 width=36)
(7 rows)

If you notice the difference between the two plans, in case of TC1, the
Index Scan was performed on the test table and as the data in the index
(btree index) is already sorted, when grouping aggregate is performed on
the column 'a', there is *no* sorting done for it (you would see that "*Sort
Key: a*" is missing in the explain plan for TC1)and for that reason it
expects the slot to contain the heap tuple but then, as the slots are
fetched from the tuplesort object, it actually contains minimal tuple. On
the other hand, if you see the explain plan for TC2, the sorting is done
for both the groups (i.e. both "Sort Key: b" && "Sort Key: a" exists) and
hence the expected slot is always the minimal slot so there is no assertion
failure in case 2.

Thanks,

--
With Regards,
Ashutosh Sharma
EnterpriseDB:*http://www.enterprisedb.com <http://www.enterprisedb.com/>*

On Wed, May 29, 2019 at 5:50 PM Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com>
wrote:

> Hi All,
>
> I'm getting a server crash when executing the following test-case:
>
> create table t1(a int primary key, b text);
> insert into t1 values (1, 'aa'), (2, 'bb'), (3, 'aa'), (4, 'bb');
> select a, b, array_agg(a order by a) from t1 group by grouping sets ((a),
> (b));
>
> *Backtrace:*
> #0 0x00007f37d0630277 in raise () from /lib64/libc.so.6
> #1 0x00007f37d0631968 in abort () from /lib64/libc.so.6
> #2 0x0000000000a5685e in ExceptionalCondition (conditionName=0xc29fd0
> "!(op->d.fetch.kind == slot->tts_ops)", errorType=0xc29cc1
> "FailedAssertion",
> fileName=0xc29d09 "execExprInterp.c", lineNumber=1905) at assert.c:54
> #3 0x00000000006dfa2b in CheckOpSlotCompatibility (op=0x2e84e38,
> slot=0x2e6e268) at execExprInterp.c:1905
> #4 0x00000000006dd446 in ExecInterpExpr (state=0x2e84da0,
> econtext=0x2e6d8e8, isnull=0x7ffe53cba4af) at execExprInterp.c:439
> #5 0x00000000007010e5 in ExecEvalExprSwitchContext (state=0x2e84da0,
> econtext=0x2e6d8e8, isNull=0x7ffe53cba4af)
> at ../../../src/include/executor/executor.h:307
> #6 0x0000000000701be7 in advance_aggregates (aggstate=0x2e6d6b0) at
> nodeAgg.c:679
> #7 0x0000000000703a5d in agg_retrieve_direct (aggstate=0x2e6d6b0) at
> nodeAgg.c:1847
> #8 0x00000000007034da in ExecAgg (pstate=0x2e6d6b0) at nodeAgg.c:1572
> #9 0x00000000006e797f in ExecProcNode (node=0x2e6d6b0) at
> ../../../src/include/executor/executor.h:239
> #10 0x00000000006ea174 in ExecutePlan (estate=0x2e6d458,
> planstate=0x2e6d6b0, use_parallel_mode=false, operation=CMD_SELECT,
> sendTuples=true,
> numberTuples=0, direction=ForwardScanDirection, dest=0x2e76b30,
> execute_once=true) at execMain.c:1648
> #11 0x00000000006e7f91 in standard_ExecutorRun (queryDesc=0x2e7b3b8,
> direction=ForwardScanDirection, count=0, execute_once=true) at
> execMain.c:365
> #12 0x00000000006e7dc7 in ExecutorRun (queryDesc=0x2e7b3b8,
> direction=ForwardScanDirection, count=0, execute_once=true) at
> execMain.c:309
> #13 0x00000000008e40c7 in PortalRunSelect (portal=0x2e10bc8, forward=true,
> count=0, dest=0x2e76b30) at pquery.c:929
> #14 0x00000000008e3d66 in PortalRun (portal=0x2e10bc8,
> count=9223372036854775807, isTopLevel=true, run_once=true, dest=0x2e76b30,
> altdest=0x2e76b30,
> completionTag=0x7ffe53cba850 "") at pquery.c:770
>
> The following Assert statement in *CheckOpSlotCompatibility*() fails.
>
> 1905 Assert(op->d.fetch.kind == slot->tts_ops);
>
> And above assert statement was added by you as a part of the following git
> commit.
>
> commit 15d8f83128e15de97de61430d0b9569f5ebecc26
> Author: Andres Freund <andres(at)anarazel(dot)de>
> Date: Thu Nov 15 22:00:30 2018 -0800
>
> Verify that expected slot types match returned slot types.
>
> This is important so JIT compilation knows what kind of tuple slot the
> deforming routine can expect. There's also optimization potential for
> expression initialization without JIT compilation. It e.g. seems
> plausible to elide EEOP_*_FETCHSOME ops entirely when dealing with
> virtual slots.
>
> Author: Andres Freund
>
> *Analysis:*
> I did some quick investigation on this and found that when the aggregate
> is performed on the first group i.e. group by 'a', all the input tuples are
> fetched from the outer plan and stored into the tuplesort object and for
> the subsequent groups i.e. from the second group onwards, the tuples stored
> in tuplessort object during 1st phase is used. But, then, the tuples stored
> in the tuplesort object are actually the minimal tuples whereas it is
> expected to be a heap tuple which actually results into the assertion
> failure.
>
> I might be wrong, but it seems to me like the slot fetched from tuplesort
> object needs to be converted to the heap tuple. Actually the following
> lines of code in agg_retrieve_direct() gets executed only when we have
> crossed a group boundary. I think, at least the function call to
> ExecCopySlotHeapTuple(outerslot); followed by ExecForceStoreHeapTuple();
> should always happen irrespective of the group boundary limit is crossed or
> not... Sorry if I'm saying something ...
>
> 1871 * If we are grouping,
> check whether we've crossed a group
> │
> │1872 * boundary.
>
> │
> │1873 */
>
> │
> │1874 if (node->aggstrategy
> != AGG_PLAIN)
> │
> │1875 {
>
> │
> │1876
> tmpcontext->ecxt_innertuple = firstSlot;
> │
> │1877 if
> (!ExecQual(aggstate->phase->eqfunctions[node->numCols - 1],
> │
> │1878
> tmpcontext))
> │
> │1879 {
>
> │
> │1880
> aggstate->grp_firstTuple = ExecCopySlotHeapTuple(outerslot);
> │
> │1881 break;
>
> │
> │1882 }
>
> │
> │1883 }
>
> --
> With Regards,
> Ashutosh Sharma
> EnterpriseDB:*http://www.enterprisedb.com <http://www.enterprisedb.com/>*
>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Khandekar 2019-05-30 14:16:26 Re: Minimal logical decoding on standbys
Previous Message Amit Kapila 2019-05-30 09:51:18 Re: Fix inconsistencies for v12