Re: asynchronous execution

From: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To: Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp
Cc: robertmhaas(at)gmail(dot)com, amitdkhan(dot)pg(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: asynchronous execution
Date: 2017-02-22 08:39:45
Message-ID: 20170222.173945.262776579.horiguchi.kyotaro@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox
Thread:
Lists: pgsql-hackers

At Thu, 16 Feb 2017 21:06:00 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote in <20170216(dot)210600(dot)214980879(dot)horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
> > #3 0x00000000006883ed in ExecAsyncEventWait (estate=0x13c01b8,
> > timeout=-1) at execAsync.c:345
>
> This means no pending fdw scan didn't let itself go to waiting
> stage. It leads to a stuck of the whole things. This is caused if
> no one acutually is waiting for result. I suppose that all of the
> foreign scans ran on the same connection. Anyway it should be a
> mistake in state transition. I'll look into it.
...
> > I was running a query whose plan looked like:
> >
> > explain (costs off) select tableoid::regclass, a, min(b), max(b) from ptab
> > group by 1,2 order by 1;
> > QUERY PLAN
> > ------------------------------------------------------
> > Sort
> > Sort Key: ((ptab.tableoid)::regclass)
> > -> HashAggregate
> > Group Key: (ptab.tableoid)::regclass, ptab.a
> > -> Result
> > -> Append
> > -> Foreign Scan on ptab_00001
> > -> Foreign Scan on ptab_00002
> > -> Foreign Scan on ptab_00003
> > -> Foreign Scan on ptab_00004
> > -> Foreign Scan on ptab_00005
> > -> Foreign Scan on ptab_00006
> > -> Foreign Scan on ptab_00007
> > -> Foreign Scan on ptab_00008
> > -> Foreign Scan on ptab_00009
> > -> Foreign Scan on ptab_00010
> > <snip>
> >
> > The snipped part contains Foreign Scans on 90 more foreign partitions (in
> > fact, I could see the crash even with 10 foreign table partitions for the
> > same query).
>
> Yeah, it seems to me unrelated to how many they are.

Finally, I couldn't see the crash for the (maybe) same case. I
can guess two reasons for this. One is that a situation where
node->as_nasyncpending differs from estate->es_num_pending_async,
but I couldn't find a possibility. Another is a situation in
postgresIterateForeignScan where the "next owner" reaches eof but
another waiter is not. I haven't reproduce the situation but
fixed it for the case. Addition to that I found a bug in
ExecAsyncAppendResponse. It calls bms_add_member inappropriate
way.

> Mmm, I reproduces it quite easily. A silly bug.
>
> Something bad is happening between freeing ExecutorState memory
> context and resource owner. Perhaps the ExecutorState is freed by
> resowner (as a part of its anscestors) before the memory for the
> WaitEventSet is freed. It was careless of me. I'll reconsider it.

The cause was that the WaitEventSet was placed in ExecutorState
but registered to TopTransactionResourceOwner. I fixed it.

This fixes are made on top of the previous patches for now. In
the attached files, 0008, 0009 are for the second bug, 0012 is
for the first bug. And 0013 is for bms bug.

Sorry for the confused patches, I will resend more neater ones
soon.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachment Content-Type Size
0013-Fix-a-bug-of-a-usage-of-bms_add_member.patch text/x-patch 1.0 KB
0012-Fix-a-possible-bug.patch text/x-patch 1.1 KB
0011-Some-non-functional-fixes.patch text/x-patch 15.8 KB
0010-Fix-a-typo-of-mcxt.c.patch text/x-patch 936 bytes
0009-Fix-the-resource-owner-to-be-used.patch text/x-patch 2.1 KB
0008-Allow-wait-event-set-to-be-registered-to-resource-ow.patch text/x-patch 4.4 KB
0007-Add-instrumentation-to-async-execution.patch text/x-patch 2.9 KB
0006-Apply-unlikely-to-suggest-synchronous-route-of-ExecA.patch text/x-patch 1.3 KB
0005-Use-resource-owner-to-prevent-wait-event-set-from-le.patch text/x-patch 8.9 KB
0004-Make-postgres_fdw-async-capable.patch text/x-patch 43.2 KB
0003-Modify-async-execution-infrastructure.patch text/x-patch 29.7 KB
0002-Fix-some-bugs.patch text/x-patch 16.3 KB
0001-robert-s-2nd-framework.patch text/x-patch 42.9 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavel Stehule 2017-02-22 08:51:28 Re: GRANT EXECUTE ON FUNCTION foo() TO bar();
Previous Message Amit Langote 2017-02-22 08:20:42 Re: Partitioned tables and relfilenode