Re: asynchronous and vectorized execution

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
Cc: Amit Khandekar <amitdkhan(dot)pg(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: asynchronous and vectorized execution
Date: 2016-07-07 17:59:54
Message-ID: CA+TgmobD9uM9=zVz+jvTyEM_o9rwDP3RBJkJPzb0HCpR9-085A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Jul 6, 2016 at 3:29 AM, Kyotaro HORIGUCHI
<horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
> This seems to be a good opportunity to show this patch. The
> attched patch set does async execution of foreignscan
> (postgres_fdw) on the Robert's first infrastructure, with some
> modification.

Cool.

> ExecAsyncWaitForNode can get into an inifite-waiting by recursive
> calls of ExecAsyncWaitForNode caused by ExecProcNode called from
> async-unaware nodes. Such recursive calls cause a wait on
> already-ready nodes.

Hmm, that's annoying.

> I solved that in the patch set by allocating a separate
> async-execution context for every async-execution subtrees, which
> is made by ExecProcNode, instead of one async-exec context for
> the whole execution tree. This works fine but the way switching
> contexts seems ugly. This may also be solved by make
> ExecAsyncWaitForNode return when no node to wait even if the
> waiting node is not ready. This might keep the async-exec context
> (state) simpler so I'll try this.

I think you should instead try to make ExecAsyncWaitForNode properly reentrant.

> Does the ParallelWorkerSetLatchesForGroup use mutex or semaphore
> or something like instead of latches?

Why would it do that?

>> BTW, we also need to benchmark those changes to add the parent
>> pointers and change the return conventions and see if they have any
>> measurable impact on performance.
>
> I have to bring you a bad news.
>
> With the attached patch, an append on four foreign scans on one
> server (at local) performs faster by about 10% and by twice for
> three or four foreign scns on separate foreign servers
> (connections) respectively, but only when compiled with -O0. I
> found that it can take hopelessly small amount of advantage from
> compiler optimization, while unpatched version gets faster.

Two things:

1. That's not the scenario I'm talking about. I'm concerned about
making sure that query plans that don't use asynchronous execution
don't get slower.

2. I have to believe that's a defect in your implementation rather
than something intrinsic, or maybe your test scenario is bad. It's
very hard - really impossible - to believe that all queries involving
FDW pushdown are locally CPU-bound.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2016-07-07 18:01:05 Re: Reviewing freeze map code
Previous Message Andres Freund 2016-07-07 17:58:42 Re: Reviewing freeze map code