Re: Parallel bitmap heap scan

From: Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>
To: Dilip Kumar <dilipbalaut(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Rafia Sabih <rafia(dot)sabih(at)enterprisedb(dot)com>, tushar <tushar(dot)ahuja(at)enterprisedb(dot)com>, Amit Khandekar <amitdkhan(dot)pg(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Parallel bitmap heap scan
Date: 2017-01-31 02:17:52
Message-ID: CAJrrPGe15zFv9OmhRUjbE1OJCwcJ0fGvGT_U1ZYhYm-0FfTtrg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Jan 27, 2017 at 5:32 PM, Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:

> On Tue, Jan 24, 2017 at 10:18 AM, Dilip Kumar <dilipbalaut(at)gmail(dot)com>
> wrote:
> > I have changed as per the comments. 0002 and 0003 are changed, 0001 is
> > still the same.
>
> 2 days back my colleague Rafia, reported one issue (offlist) that
> parallel bitmap node is not scaling as good as other nodes e.g
> parallel sequence scan and parallel index scan.
>
> After some perf analysis, I found that there was one unconditional
> spin lock in parallel bitmap patch which we were taking for checking
> the prefetch target. Basically, we were always taking the lock and
> checking the prefetch_target is reached to prefetch_maximum. So even
> after it will reach to prefetch_maximum we will acquire the lock and
> check after every tuple. I have changed that logic, now I will check
> the condition first if we need to increase the target then will take
> the lock and recheck the condition.
>
> There is just one line change in 0003 compared to older version, all
> other patches are the same.
>
> Some performance data to show that new parallel bitmap patch performs
> way better than the previous version.
> TPCH scale 20, work_mem 64MB, shared buffers 8GB (4 workers)
> machine intel 8 socket machine
>
> v13(time in ms) v14 (time in ms)
> Q6 37065.841 18202.903
>
> Q14 15229.569 11341.121

Thanks for the update. I have some comments

0002-hash-support-alloc-free-v14.patch:

+ if (tb->alloc)
+ {

The memory for tb->alloc is allocated always, is the if check still
required?

0003-parallel-bitmap-heap-scan-v14.patch:

+ * and set parallel flag in lower level bitmap index scan. Later
+ * bitmap index node will use this flag to indicate tidbitmap that
+ * it needs to create an shared page table.
+ */

I feel better to mention, where this flag is used, so that it will be more
clear.

+ * Mark tidbitmap as shared and also store DSA area in it.
+ * marking tidbitmap as shared is indication that this tidbitmap
+ * should be created in shared memory. DSA area will be used for

The flag of shared is set in tidbitmap structure itself, but the
comment is mentioned that tidbitmpa should be created in shared memory.
I think that is the page table that needs to be created in shared memory.
Providing some more information here will be helpful.

- node->tbmres = tbmres = tbm_iterate(tbmiterator);
+ node->tbmres = tbmres = pbms_parallel_iterate(tbmiterator,
+ pbminfo ? &pbminfo->tbmiterator : NULL);

Instead Passing both normal and paralle iterators to the functions
and checking inside again for NULL, How about just adding check
in the caller itself? Or if you prefer the current method, then try to
explain the details of input in the function header and more description.

+ /* Increase prefetch target if it's not yet at the max. */
+ if (node->prefetch_target < node->prefetch_maximum)

I didn't evaluate all scenarios, but the above code may have a problem,
In case of parallel mode the the prefetch_target is fetched from node
and not from the pbminfo. Later it gets from the pminfo and update that.
May be this code needs to rewrite.

+ TBMIterator *iterator = node->prefetch_iterator;

Why another variable? Why can't we use the prefetch_iterator itself?
Currently node->tbmiterator and node->prefetch_iterator are initialized
irrespective of whether is it a parallel one or not. But while using, there
is a check to use the parallel one in case of parallel. if it is the case,
why can't we avoid the initialization itself?

+ else
+ needWait = false;

By default needWait is false. Just set that to true only for the
case of PBM_INPROGRESS

+ * so that during free we can directly get the dsa_pointe and free it.

dsa_pointe -> dsa_pointer

+typedef struct
+{
+ TIDBitmap *tbm; /* TIDBitmap we're iterating over */
+ int spageptr; /* next spages index */
+ int schunkptr; /* next schunks index */
+ int schunkbit; /* next bit to check in current schunk */
+ TBMIterateResult output; /* MUST BE LAST (because variable-size) */
+} TBMIterator;

I didn't find the need of moving this structure. Can you point me where it
is used.

Regards,
Hari Babu
Fujitsu Australia

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2017-01-31 03:10:06 Re: Speedup twophase transactions
Previous Message Michael Paquier 2017-01-31 02:07:57 Re: Potential data loss of 2PC files