Quick Links

Re: Parallel Full Hash Join

From:	Zhihong Yu <zyu(at)yugabyte(dot)com>
To:	Melanie Plageman <melanieplageman(at)gmail(dot)com>
Cc:	Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Parallel Full Hash Join
Date:	2021-04-06 20:56:19
Message-ID:	CALNJ-vS5GELo_=kJ7wNjKVPe5qWLXGfLABTC8hhCW1Y3Ls9BJw@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Tue, Apr 6, 2021 at 11:59 AM Melanie Plageman <melanieplageman(at)gmail(dot)com>
wrote:

> On Fri, Apr 2, 2021 at 3:06 PM Zhihong Yu <zyu(at)yugabyte(dot)com> wrote:
> >
> > Hi,
> > For v6-0003-Parallel-Hash-Full-Right-Outer-Join.patch
> >
> > + * current_chunk_idx: index in current HashMemoryChunk
> >
> > The above comment seems to be better fit for
> ExecScanHashTableForUnmatched(), instead of
> ExecParallelPrepHashTableForUnmatched.
> > I wonder where current_chunk_idx should belong (considering the above
> comment and what the code does).
> >
> > + while (hashtable->current_chunk_idx <
> hashtable->current_chunk->used)
> > ...
> > + next = hashtable->current_chunk->next.unshared;
> > + hashtable->current_chunk = next;
> > + hashtable->current_chunk_idx = 0;
> >
> > Each time we advance to the next chunk, current_chunk_idx is reset. It
> seems current_chunk_idx can be placed inside chunk.
> > Maybe the consideration is that, with the current formation we save
> space by putting current_chunk_idx field at a higher level.
> > If that is the case, a comment should be added.
> >
>
> Thank you for the review. I think that moving the current_chunk_idx into
> the HashMemoryChunk would probably take up too much space.
>
> Other places that we loop through the tuples in the chunk, we are able
> to just keep a local idx, like here in
> ExecParallelHashIncreaseNumBuckets():
>
> case PHJ_GROW_BUCKETS_REINSERTING:
> ...
> while ((chunk = ExecParallelHashPopChunkQueue(hashtable,
> &chunk_s)))
> {
> size_t idx = 0;
>
> while (idx < chunk->used)
>
> but, since we cannot do that while also emitting tuples, I thought,
> let's just stash the index in the hashtable for use in serial hash join
> and the batch accessor for parallel hash join. A comment to this effect
> sounds good to me.
>

From the way HashJoinTable is used, I don't have better idea w.r.t. the
location of current_chunk_idx.
It is not worth introducing another level of mapping between HashJoinTable
and the chunk index.

So the current formation is fine with additional comment
in ParallelHashJoinBatchAccessor (current comment doesn't explicitly
mention current_chunk_idx).

Cheers

In response to

Re: Parallel Full Hash Join at 2021-04-06 18:59:23 from Melanie Plageman

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Alvaro Herrera	2021-04-06 20:56:49	Re: Autovacuum on partitioned table (autoanalyze)
Previous Message	Dan Lynch	2021-04-06 20:16:16	Re: policies with security definer option for allowing inline optimization