Quick Links

Re: [HACKERS] Parallel Hash take II

From:	Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
To:	Andres Freund <andres(at)anarazel(dot)de>
Cc:	Rushabh Lathia <rushabh(dot)lathia(at)gmail(dot)com>, Prabhat Sahu <prabhat(dot)sahu(at)enterprisedb(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>, Robert Haas <robertmhaas(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Rafia Sabih <rafia(dot)sabih(at)enterprisedb(dot)com>, Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>, Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>, Oleg Golovanov <rentech(at)mail(dot)ru>
Subject:	Re: [HACKERS] Parallel Hash take II
Date:	2017-12-12 09:46:11
Message-ID:	CAEepm=09rr65VN+cAV5FgyM_z=D77Xy8Fuc9CDDDYbq3pQUezg@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Sat, Dec 2, 2017 at 3:46 PM, Andres Freund <andres(at)anarazel(dot)de> wrote:
> Looking at 0004-Add-shared-tuplestores.patch
>
> Comments:
> - I'd rename mutex to lock. Seems quite possible that we end up with shared
> lockers too.

Done.

> - Expand "Simple mechanism for sharing tuples between backends." intro
> comment - that doesn't really seem like a meaningful description of
> the mechanism. Should probably mention that it's similar to
> tuplestores etc...

Done.

> - I'm still concerned about the chunking mechanism. How about this
> sketch of an alternative solution?
>
> Chunks are always the same length. To avoid having to read the length
> from disk while holding a lock, introduce continuation chunks which
> store the amount of space needed by the overlarge tuple at the
> start. The reading process stays largely the same, except that if a
> backend reads a chunk that's a continuation, it reads the length of
> the continuation and skips ahead. That could mean that more than one
> backend read continuation chunks, but the window is small and there's
> normally not goign to be that many huge tuples (otherwise things are
> slow anyway).

Done.

I've also included a simple test harness that can be used to drive
SharedTuplestore independently of Parallel Hash, but that patch is not
for commit. See example of usage in the commit message.
(Incidentally I noticed that ParallelWorkerNumber is not marked
PGDLLIMPORT so that fails to build on Windows CI.)

> - why are we using a separate hardcoded 32 for sts names? Why not just
> go for NAMEDATALEN or such?

Done.

> - I'd replace most of the "should's" in comments with "need".

Done.

Another problem I discovered is that v27's way of installing a
different function into ExecProcNode in ExecInitHashJoin() was broken:
it didn't allow for the possibility that there is no DSA area
available due to lack of resources. ExecInitNode() is too soon to
decide. My solution is to provide a way for executor nodes to change
their ExecProcNode functions at any later time, which requires a way
for execProcnode.c to redo any wrapper functions.

--
Thomas Munro
http://www.enterprisedb.com

Attachment	Content-Type	Size
parallel-hash-v28.patchset.tgz	application/x-gzip	46.0 KB

In response to

Re: [HACKERS] Parallel Hash take II at 2017-12-08 00:00:03 from Andres Freund

Responses

Re: [HACKERS] Parallel Hash take II at 2017-12-13 22:45:07 from Andres Freund
Re: [HACKERS] Parallel Hash take II at 2017-12-14 09:12:38 from Andres Freund

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Rushabh Lathia	2017-12-12 10:09:29	Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)
Previous Message	Amit Langote	2017-12-12 09:43:57	Re: Boolean partitions syntax