Re: Parallel bitmap heap scan

From: Dilip Kumar <dilipbalaut(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>, Rafia Sabih <rafia(dot)sabih(at)enterprisedb(dot)com>, tushar <tushar(dot)ahuja(at)enterprisedb(dot)com>, Amit Khandekar <amitdkhan(dot)pg(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Parallel bitmap heap scan
Date: 2017-02-28 15:48:26
Message-ID: CAFiTN-su5SW6ekgrOYV7SLMNDuZzMT4wqP1Sp1=NLXpFAGnUQA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox
Thread:
Lists: pgsql-hackers

On Sun, Feb 26, 2017 at 9:26 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> I'm not entirely sure about the calling convention for
> tbm_free_shared_area() but the rest seems OK.

Changed
>
>> 2. Now tbm_free is not freeing any of the shared members which can be
>> accessed by other worker so tbm_free is safe to call from
>> ExecEndBitmapHeapScan without any safety check or ref count.
>
> That also seems fine. We ended up with something very similar in the
> Parallel Index Scan patch.
>
>> 0002:
>> 1. We don't need ExecShutdownBitmapHeapScan anymore because now we are
>> not freeing any shared member in ExecEndBitmapHeapScan.
>> 2. In ExecReScanBitmapHeapScan we will call tbm_free_shared_area to
>> free the shared members of the TBM.
>> 3. After that, we will free TBMSharedIteratorState what we allocated
>> using tbm_prepare_shared_iterate.
>
> Check. But I think tbm_free_shared_area() should also free the object
> itself, instead of making the caller do that separately.

Right, done that way.
>
> + if (DsaPointerIsValid(node->pstate->tbmiterator))
> + {
> + /* First we free the shared TBM members using the shared state */
> + tbm_free_shared_area(dsa, node->pstate->tbmiterator);
> + dsa_free(dsa, node->pstate->tbmiterator);
> + }
> +
> + if (DsaPointerIsValid(node->pstate->prefetch_iterator))
> + dsa_free(dsa, node->pstate->prefetch_iterator);
>
> The fact that these cases aren't symmetric suggests that your
> abstraction is leaky. I'm guessing that you can't call
> tbm_free_shared_area because the two iterators share one copy of the
> underlying iteration arrays, and the TBM code isn't smart enough to
> avoid freeing them twice. You're going to have to come up with a
> better solution to that problem; nodeBitmapHeapScan.c shouldn't know
> about the way the underlying storage details are managed. (Maybe you
> need to reference-count the iterator arrays?)

Converted iterator arrays to structure and maintained the refcount. I
had to do the same thing for pagetable also because that is also
shared across iterator.

>
> + if (node->inited)
> + goto start_iterate;
>
> My first programming teacher told me not to use goto. I've
> occasionally violated that rule, but I need a better reason than you
> have here. It looks very easy to avoid.

Changed
>
> + pbms_set_parallel(outerPlanState(node));
>
> I think this should be a flag in the plan, and the planner should set
> it correctly, instead of having it be a flag in the executor that the
> executor sets. Also, the flag itself should really be called
> something that involves the word "shared" rather than "parallel",
> because the bitmap will not be created in parallel, but it will be
> shared.
Done
>
> Have you checked whether this patch causes any regression in the
> non-parallel case? It adds a bunch of "if" statements, so it might.
> Hopefully not, but it needs to be carefully tested.

With new patch I tested in my local machine, perform multiple
executions and it doesn't show any regression. Attached file
perf_result contains the explain analyze output for one of the query
(head, patch with 0 workers and patch with 2 workers). I will perform
the test with all the TPC-H queries which using bitmap plan on the
bigger machine and will post the results next week.
>
> @@ -48,10 +48,11 @@
> #include "utils/snapmgr.h"
> #include "utils/tqual.h"
>
> -
> static TupleTableSlot *BitmapHeapNext(BitmapHeapScanState *node);
>
> Unnecessary.
Fixed
>
> +static bool pbms_is_leader(ParallelBitmapState *pbminfo);
> +static void pbms_set_parallel(PlanState *node);
>
> I don't think this "pbms" terminology is very good. It's dissimilar
> to the way other functions in this file are named. Maybe
> BitmapShouldInitializeSharedState().
Changed
>
> I think that some of the bits that this function makes conditional on
> pstate should be factored out into inline functions. Like:
>
> - if (node->prefetch_target >= node->prefetch_maximum)
> - /* don't increase any further */ ;
> - else if (node->prefetch_target >= node->prefetch_maximum / 2)
> - node->prefetch_target = node->prefetch_maximum;
> - else if (node->prefetch_target > 0)
> - node->prefetch_target *= 2;
> - else
> - node->prefetch_target++;
> + if (!pstate)
> + {
> + if (node->prefetch_target >= node->prefetch_maximum)
> + /* don't increase any further */ ;
> + else if (node->prefetch_target >= node->prefetch_maximum / 2)
> + node->prefetch_target = node->prefetch_maximum;
> + else if (node->prefetch_target > 0)
> + node->prefetch_target *= 2;
> + else
> + node->prefetch_target++;
> + }
> + else if (pstate->prefetch_target < node->prefetch_maximum)
> + {
> + SpinLockAcquire(&pstate->prefetch_mutex);
> + if (pstate->prefetch_target >= node->prefetch_maximum)
> + /* don't increase any further */ ;
> + else if (pstate->prefetch_target >=
> + node->prefetch_maximum / 2)
> + pstate->prefetch_target = node->prefetch_maximum;
> + else if (pstate->prefetch_target > 0)
> + pstate->prefetch_target *= 2;
> + else
> + pstate->prefetch_target++;
> + SpinLockRelease(&pstate->prefetch_mutex);
> + }
>
> I suggest creating an inline function like BitmapAdjustPrefetch() for
> this logic, and letting the code call that. The function can look
> something like this: if (pstate == NULL) { non-parallel stuff; return;
> } parallel stuff follows...
>
> And similarly for the other cases where you've made the logic
> conditional. This will make it more clear what's happening
> post-patch, I think, and will also help keep the level of indentation
> from getting out-of-control in certain places. In fact, maybe you
> should submit a preliminary refactoring patch that moves these chunks
> of logic into functions and then the main patch can apply over top of
> that.

Done.

>
> + bool inited;
>
> Suggest: initialized

Done
>
> - * ----------------
> + * pscan_len size of the shared memory for parallel bitmap
> + * inited is node is ready to iterate
> + * stbmiterator shared iterator
> + * sprefetch_iterator shared iterator for prefetching
> + * pstate shared state for parallel bitmap scan
> + *----------------
>
> No need to change number of dashes.
Fixed
>
> + * PBMState information : Current status of the TIDBitmap creation during
> + * parallel bitmap heap scan.
>
> If you look for existing places where comments are formatted like
> this, I bet you won't find many. Copy the surrounding style more.

Done as surrounding structure, also changed the name to
SharedBitmapState, I think that is better name for the purpose.

>
> + dsa_pointer tbmiterator;
> + dsa_pointer prefetch_iterator;
> + slock_t prefetch_mutex;
> + int prefetch_pages;
> + int prefetch_target;
> + bool prefetching;
> + slock_t state_mutex;
> + PBMState state;
> + ConditionVariable cv;
> + char phs_snapshot_data[FLEXIBLE_ARRAY_MEMBER];
>
> I think it is probably not a good idea to have two separate mutexes
> here. They'll be in the same cache line, so it won't be much faster
> than having one mutex, and the state mutex won't be acquired very
> often so you won't really gain anything anyway. I think you can just
> merge the mutexes into one called 'mutex'.

Done.
>
> + /* Allow only one process to prefetch */
>
> If this is a good idea, there should be a comment explaining why.

Done
>
> + TBMSharedIterator *stbmiterator;
> + TBMSharedIterator *sprefetch_iterator;
>
> Maybe shared_iterator and shared_prefetch_iterator.
Done

0001- same as previous with some changes for freeing the shared memory stuff.
0002- nodeBitmapHeapScan refactoring, this applies independently
0003- actual parallel bitmap stuff applies on top of 0001 and 0002

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Attachment Content-Type Size
0001-tidbitmap-support-shared-v6.patch application/octet-stream 25.3 KB
0002-bitmap-heapscan-refactoring_v6.patch application/octet-stream 4.9 KB
0003-parallel-bitmap-heapscan-v6.patch application/octet-stream 38.1 KB
perf_result application/octet-stream 5.3 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Dilip Kumar 2017-02-28 15:58:51 Re: Proposal : Parallel Merge Join
Previous Message Stephen Frost 2017-02-28 15:39:01 Re: IF (NOT) EXISTS in psql-completion