Re: parallel foreign scan

From: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To: m(dot)kniep(at)web(dot)de
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: parallel foreign scan
Date: 2018-05-16 02:09:02
Message-ID: 20180516.110902.216853091.horiguchi.kyotaro@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox
Thread:
Lists: pgsql-hackers

At Tue, 15 May 2018 23:09:31 +0200, Manuel Kniep <m(dot)kniep(at)web(dot)de> wrote in <D84E3D72-2E83-482B-8EF8-D25F93F1CEA8(at)web(dot)de>
> Dear hackers,
>
> I’m working on a foreign database wrapper for Kafka [1]
> Now I am trying to make it parallel aware. Following
> the documentation [2]
> However it seems that I can’t make it use more than a
> single worker with force_parallel_mode = on.
>
> I wonder if I need to do more than just implementing the
> needed callback function to benefit from multiple workers.
>
> Looking at create_foreignscan_path in path_nodes.c
> I found that the ForeignPath seems to always set
>
> pathnode->path.parallel_aware = false;
> pathnode->path.parallel_safe = rel->consider_parallel;
> pathnode->path.parallel_workers = 0;
>
> Do I need so set these in my GetForeignPaths callback manually?

Right. create_foreignscan_path is used by FDW drivers to create
the path struct. GetForeignPaths() needs to finish the path by
setting the parameters and partial paths.

# I myself haven't do that so I'm not sure the details.

> Is there anything else I need to do?

I think you are trying collecting data from multple kafka
server. This means each server has a dedicate foreign table on a
dedicate foreign server. Parallel execution doesn't fit in that
case since it works on single base relation (or a
table). Parallel append/merge append look a bit different but
actually is the same in the sense that one base relation is
scanned on multiple workers. Even if you are trying to fetch from
one kafka stream on multiple workers, I think the fdw driver
doesn't support parallel scanning anyway.

In any case it is inevitable to modify the fdw driver.

If you are trying to collect data from multple servers, the
following proposed PoC patch is a implement of asynchronous
execution of postgres_fdw and it might be helpful.

https://www.postgresql.org/message-id/20180515.202945.69332784.horiguchi.kyotaro@lab.ntt.co.jp

The postgres_fdw.c part in it is complicated since it supports
shared connection but not that complex ignoring that.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2018-05-16 04:03:18 Re: Cache lookup errors with functions manipulation object addresses
Previous Message Thomas Munro 2018-05-16 00:32:01 Re: [HACKERS] Planning counters in pg_stat_statements