From: | Zhihong Yu <zyu(at)yugabyte(dot)com> |
---|---|
To: | Melanie Plageman <melanieplageman(at)gmail(dot)com> |
Cc: | Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Parallel Full Hash Join |
Date: | 2021-04-06 20:56:19 |
Message-ID: | CALNJ-vS5GELo_=kJ7wNjKVPe5qWLXGfLABTC8hhCW1Y3Ls9BJw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Tue, Apr 6, 2021 at 11:59 AM Melanie Plageman <melanieplageman(at)gmail(dot)com>
wrote:
> On Fri, Apr 2, 2021 at 3:06 PM Zhihong Yu <zyu(at)yugabyte(dot)com> wrote:
> >
> > Hi,
> > For v6-0003-Parallel-Hash-Full-Right-Outer-Join.patch
> >
> > + * current_chunk_idx: index in current HashMemoryChunk
> >
> > The above comment seems to be better fit for
> ExecScanHashTableForUnmatched(), instead of
> ExecParallelPrepHashTableForUnmatched.
> > I wonder where current_chunk_idx should belong (considering the above
> comment and what the code does).
> >
> > + while (hashtable->current_chunk_idx <
> hashtable->current_chunk->used)
> > ...
> > + next = hashtable->current_chunk->next.unshared;
> > + hashtable->current_chunk = next;
> > + hashtable->current_chunk_idx = 0;
> >
> > Each time we advance to the next chunk, current_chunk_idx is reset. It
> seems current_chunk_idx can be placed inside chunk.
> > Maybe the consideration is that, with the current formation we save
> space by putting current_chunk_idx field at a higher level.
> > If that is the case, a comment should be added.
> >
>
> Thank you for the review. I think that moving the current_chunk_idx into
> the HashMemoryChunk would probably take up too much space.
>
> Other places that we loop through the tuples in the chunk, we are able
> to just keep a local idx, like here in
> ExecParallelHashIncreaseNumBuckets():
>
> case PHJ_GROW_BUCKETS_REINSERTING:
> ...
> while ((chunk = ExecParallelHashPopChunkQueue(hashtable,
> &chunk_s)))
> {
> size_t idx = 0;
>
> while (idx < chunk->used)
>
> but, since we cannot do that while also emitting tuples, I thought,
> let's just stash the index in the hashtable for use in serial hash join
> and the batch accessor for parallel hash join. A comment to this effect
> sounds good to me.
>
From the way HashJoinTable is used, I don't have better idea w.r.t. the
location of current_chunk_idx.
It is not worth introducing another level of mapping between HashJoinTable
and the chunk index.
So the current formation is fine with additional comment
in ParallelHashJoinBatchAccessor (current comment doesn't explicitly
mention current_chunk_idx).
Cheers
From | Date | Subject | |
---|---|---|---|
Next Message | Alvaro Herrera | 2021-04-06 20:56:49 | Re: Autovacuum on partitioned table (autoanalyze) |
Previous Message | Dan Lynch | 2021-04-06 20:16:16 | Re: policies with security definer option for allowing inline optimization |