Re: [HACKERS] parallelize queries containing initplans

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Kuntal Ghosh <kuntalghosh(dot)2007(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] parallelize queries containing initplans
Date: 2017-11-17 09:29:04
Message-ID: CAA4eK1KcAAjhfsDn2dnnV2L-mdDm1eraAR-5CcECK2kSMYyj_w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox
Thread:
Lists: pgsql-hackers

On Thu, Nov 16, 2017 at 10:44 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Thu, Nov 16, 2017 at 5:23 AM, Kuntal Ghosh
> <kuntalghosh(dot)2007(at)gmail(dot)com> wrote:
>> I've tested the above-mentioned scenario with this patch and it is
>> working fine. Also, I've created a text column named 'vartext',
>> inserted some random length texts(max length 100) and tweaked the
>> above query as follows:
>> select ten,count(*) from tenk1 a where a.ten in (select
>> b.ten from tenk1 b where (select a.vartext from tenk1 c where c.ten =
>> a.ten limit 1) = b.vartext limit 1) group by a.ten;
>> This query is equivalent to select ten,count(*) from tenk1 group by
>> a.ten. It also produced the expected result without throwing any
>> error.
>
> Great! I have committed the patch; thanks for testing.
>

Thanks.

> As I said in the commit message, there's a lot more work that could be
> done here. I think we should consider trying to revise this whole
> system so that instead of serializing the values and passing them to
> the workers, we allocate an array of slots where each slot has a Datum
> flag, an isnull flag, and a dsa_pointer (which maybe could be union'd
> to the Datum?). If we're passing a parameter by value, we could just
> store it in the Datum field; if it's null, we just set isnull. If
> it's being passed by reference, we dsa_allocate() space for it, copy
> it into that space, and then store the dsa_pointer.
>
> The advantage of this is that it would be possible to imagine the
> contents of a slot changing while parallelism is running, which
> doesn't really work with the current serialized-blob representation.
> That would in turn allow us to imagine letting parallel-safe InitPlans
> being evaluated by the first participant that needs the value rather
> than before launching workers, which would be good, not only because
> of the possibility of deferring work for InitPlans attached at or
> above the Gather but also because it could be used for InitPlans below
> the Gather (as long as they don't depend on any parameters computed
> below the Gather).
>

That would be cool, but I think here finding whether it is dependent
on any parameter computed below gather could be tricky.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Masahiko Sawada 2017-11-17 09:34:29 Missing wal_receiver_status_interval in Subscribers section
Previous Message Andreas Joseph Krogh 2017-11-17 09:21:58 Sv: Re: pspg - psql pager