Re: [HACKERS] parallelize queries containing initplans

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Kuntal Ghosh <kuntalghosh(dot)2007(at)gmail(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] parallelize queries containing initplans
Date: 2017-11-16 17:14:42
Message-ID: CA+TgmoZLmoSF_8HVC6A2a2vE9efGm9BPDFfi7VFohHvuKGu4WQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Nov 16, 2017 at 5:23 AM, Kuntal Ghosh
<kuntalghosh(dot)2007(at)gmail(dot)com> wrote:
> I've tested the above-mentioned scenario with this patch and it is
> working fine. Also, I've created a text column named 'vartext',
> inserted some random length texts(max length 100) and tweaked the
> above query as follows:
> select ten,count(*) from tenk1 a where a.ten in (select
> b.ten from tenk1 b where (select a.vartext from tenk1 c where c.ten =
> a.ten limit 1) = b.vartext limit 1) group by a.ten;
> This query is equivalent to select ten,count(*) from tenk1 group by
> a.ten. It also produced the expected result without throwing any
> error.

Great! I have committed the patch; thanks for testing.

As I said in the commit message, there's a lot more work that could be
done here. I think we should consider trying to revise this whole
system so that instead of serializing the values and passing them to
the workers, we allocate an array of slots where each slot has a Datum
flag, an isnull flag, and a dsa_pointer (which maybe could be union'd
to the Datum?). If we're passing a parameter by value, we could just
store it in the Datum field; if it's null, we just set isnull. If
it's being passed by reference, we dsa_allocate() space for it, copy
it into that space, and then store the dsa_pointer.

The advantage of this is that it would be possible to imagine the
contents of a slot changing while parallelism is running, which
doesn't really work with the current serialized-blob representation.
That would in turn allow us to imagine letting parallel-safe InitPlans
being evaluated by the first participant that needs the value rather
than before launching workers, which would be good, not only because
of the possibility of deferring work for InitPlans attached at or
above the Gather but also because it could be used for InitPlans below
the Gather (as long as they don't depend on any parameters computed
below the Gather). I thought about trying to refactor this patch to
use a slot concept like what I just mentioned even before committing
it, but after poking at it a bit I decided that was going to require
too much new development to justify cramming it into the same patch.
Still, it seems like a good idea to do it at some point (when we have
lots of free time?).

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Antonin Houska 2017-11-16 17:31:23 Re: [HACKERS] Proposal: generic WAL compression
Previous Message Fabrízio de Royes Mello 2017-11-16 17:11:43 Re: pgsql: Add hooks for session start and session end