Re: [HACKERS] Parallel Hash take II

From: Andres Freund <andres(at)anarazel(dot)de>
To: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
Cc: Rushabh Lathia <rushabh(dot)lathia(at)gmail(dot)com>, Prabhat Sahu <prabhat(dot)sahu(at)enterprisedb(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>, Robert Haas <robertmhaas(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Rafia Sabih <rafia(dot)sabih(at)enterprisedb(dot)com>, Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>, Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>, Oleg Golovanov <rentech(at)mail(dot)ru>
Subject: Re: [HACKERS] Parallel Hash take II
Date: 2017-12-21 08:49:49
Message-ID: 20171221084949.k36qsg4t4xpkr3rh@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

> > Not really happy with the name. ExecParallelHashTableInsert() calling
> > ExecParallelHashLoadTuple() to insert a tuple into the hashtable doesn't
> > quite strike me as right; the naming similarity to
> > ExecParallelHashTableLoad seems problematic too.
> > ExecParallelHashAllocTuple() or such?
> >
> > One could argue it'd not be a bad idea to keep a similar split as
> > dense_alloc() and memcpy() have, but I'm not really convinced by
> > myself. Hm.
>
> Yeah, the names are confusing. So:

Cool.

> > Just to make sure I understand: The "empty" phase is to protect the
> > process of the electee doing the sizing calculations etc? And the
> > reason that's not possible to do just by waiting for
> > PHJ_GROW_BATCHES_REPARTITIONING is that somebody could dynamically
> > arrive, i.e. it'd not be needed in a statically sized barrier?
>
> Yeah, it's where you wait for the serial phase above to be finished.
> [ Explanation ] Does that make sense?

Yes.

>
> > + /* reset temp memory each time to avoid leaks from qual expr */
> > + ResetExprContext(econtext);
> > +
> > + if (ExecQual(hjclauses, econtext))
> >
> > I personally think it's better to avoid this pattern and store the
> > result of the ExecQual() in a variable, ResetExprContext() and then act
> > on the result. No need to keep memory around for longer, and for bigger
> > contexts you're more likely to have all the metadata in cache.
> >
> > I'd previously thought about introducing ExecQualAndReset() or such...
>
> Makes sense, but this is code that is identical in
> ExecScanHashBucket() so I think we should keep it this way for now,
> and explore expression context lifetime improvements in a separate
> patch? Looks like the same change could be made in other nodes too.

Ok.

> > +
> > + /*
> > + * If we started up so late that the shared batches have been freed
> > + * already by ExecHashTableDetach(), then we are finished.
> > + */
> > + if (!DsaPointerIsValid(hashtable->parallel_state->batches))
> > + return false;
> >
> > This is really the only place that weird condition is detected? And why
> > is that possible in the first place? And if possible, shouldn't we have
> > detected earlier? Also, if possible, what prevents this to occur in a
> > way that test fails, because pstate->batches is freed, but not yet
> > reset?
>
> ExecParallelHashJoinNewBatch(), where this code appears, is generally
> the place that we discover that we're finished. Normally we discover
> that we're finished by seeing that there are no batches left to chew
> on, by inspecting the per-batch state in shmem. This weird condition
> arises when a worker starts up so late that the join is finished and
> the shmem space used to tracks batches has already been freed. I
> agree that that was badly explained and there was in fact something a
> bit kooky about that coding. I have now changed it so that
> ExecParallelHashEnsureBatchAccessors() detects this case and has a
> better comment to explain it, and ExecParallelHashJoinNewBatch() now
> just looks out for hashtable->batches == NULL with a comment referring
> to the other place.

Thanks for updating.

My compiler complained that ExecHashJoinImpl() might not be
inlinable. That's just because you declared it always_inline without
actually making it an inline function...

Pushed. Yeha! Congrats, this has been quite the project.

I suspect we'll find a bunch of problems, both on the planning and
execution side, but I think at this point we're much more likely to find
and resolve these in-tree vs. out of tree.

Btw, I see dsa_get_address() show up pretty prominently in profiles. I
kinda wonder if there's some cases where we could ameliorate the cost by
recognizing that a bunch of lookups are all going to reside in the same
segment.

- Andres

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message David Rowley 2017-12-21 09:01:48 Re: [HACKERS] Runtime Partition Pruning
Previous Message Andres Freund 2017-12-21 08:49:46 pgsql: Add parallel-aware hash joins.