Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)

From: Rushabh Lathia <rushabh(dot)lathia(at)gmail(dot)com>
To: Peter Geoghegan <pg(at)bowt(dot)ie>
Cc: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Corey Huinker <corey(dot)huinker(at)gmail(dot)com>
Subject: Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)
Date: 2017-11-14 09:41:39
Message-ID: CAGPqQf17vxDXFi=u9yf=aDF_0LJ+NGmRKJ0fUJYVUtzseQm6Zw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Thanks Peter and Thomas for the review comments.

On Wed, Nov 1, 2017 at 3:59 AM, Peter Geoghegan <pg(at)bowt(dot)ie> wrote:

> On Thu, Oct 26, 2017 at 4:22 AM, Rushabh Lathia
> <rushabh(dot)lathia(at)gmail(dot)com> wrote:
> > Attaching the re based patch according to the v22 parallel-hash patch
> sets
>
> I took a quick look at this today, and noticed a few issues:
>
> * make_name() is used to name files in sharedtuplestore.c, which is
> what is passed to BufFileOpenShared() for parallel hash join. Your
> using your own logic for that within the equivalent logtape.c call to
> BufFileOpenShared(), presumably because make_name() wants to identify
> participants by PID rather than by an ordinal identifier number.
>
> I think that we need some kind of central registry for things that use
> shared buffiles. It could be that sharedtuplestore.c is further
> generalized to support this, or it could be that they both call
> something else that takes care of naming. It's not okay to have this
> left to random chance.
>
> You're going to have to ask Thomas about this. You should also use
> MAXPGPATH for the char buffer on the stack.
>
>
Used MAXPGPATH for the char buffer.

> * This logtape.c comment needs to be updated, as it's no longer true:
>
>
* successfully. In general, workers can take it that the leader will
> * reclaim space in files under their ownership, and so should not
> * reread from tape.
>
>
Done.

> * Robert hated the comment changes in the header of nbtsort.c. You
> might want to change it back, because he is likely to be the one that
> commits this.
>
>
* You should look for similar comments in tuplesort.c (IIRC a couple
> of places will need to be revised).
>
>
Pending.

> * tuplesort_begin_common() should actively reject a randomAccess
> parallel case using elog(ERROR).
>
>
Done.

> * tuplesort.h should note that randomAccess isn't supported, too.
>
>
Done.

> * What's this all about?:
>
> + /* Accessor for the SharedBufFileSet that is at the end of Sharedsort. */
> + #define GetSharedBufFileSet(shared) \
> + ((BufFileSet *) (&(shared)->tapes[(shared)->nTapes]))
>
> You can't just cast from one type to the other without regard for the
> underling size of the shared memory buffer, which is what this looks
> like to me. This only fails to crash because you're only abusing the
> last member in the tapes array for this purpose, and there happens to
> be enough shared memory slop that you get away with it. I'm pretty
> sure that ltsUnify() ends up clobbering the last/leader tape, which is
> a place where BufFileSet is also used, so this is just wrong. You
> should rethink the shmem structure a little bit.
>
>
Fixed this by adding a SharedFileSet directly into the Sharedsort struct.

Thanks Thomas Munro for the offline help here.

* There is still that FIXME comment within leader_takeover_tapes(). I
> believe that you should still have a leader tape (at least in local
> memory in the leader), even though you'll never be able to do anything
> with it, since randomAccess is no longer supported. You can remove the
> FIXME, and just note that you have a leader tape to be consistent with
> the serial case, though recognize that it's not useful. Note that even
> with randomAccess, we always had the leader tape, so it's not that
> different, really.
>
>
Done.

> I suppose it might make sense to make shared->tapes not have a leader
> tape. It hardly matters -- perhaps you should leave it there in order
> to keep the code simple, as you'll be keeping the leader tape in local
> memory, too. (But it still won't fly to continue to clobber it, of
> course -- you still need to find a dedicated place for BufFileSet in
> shared memory.)
>
>
Attaching the latest patch (v13) here and I will continue working on the
comment
improvement part for nbtsort.c and tuplesort.c. Also will perform more
testing
with the attached patch.

Patch is re-base of v25 patch set of Parallel hash.

Thanks,
Rushabh Lathia
www.EnterpriseDB.com

Attachment Content-Type Size
0001-Add-parallel-B-tree-index-build-sorting_v13.patch text/x-patch 150.6 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Rowley 2017-11-14 09:51:47 Re: [HACKERS] Proposal: Local indexes for partitioned table
Previous Message Moon Insung 2017-11-14 09:36:30 RE: [HACKERS][PATCH]pg_buffercache add a buffer state column, Add fuction to decode buffer state