Re: Parallel bitmap heap scan

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Dilip Kumar <dilipbalaut(at)gmail(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Parallel bitmap heap scan
Date: 2016-10-20 04:37:22
Message-ID: CAA4eK1+04U34haLjpaPkEpV2nPmj0NPmswYjZVXY0vo6=GoGrg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Oct 19, 2016 at 9:23 PM, Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
> On Wed, Oct 19, 2016 at 12:39 PM, Andres Freund <andres(at)anarazel(dot)de> wrote:
>> Try measuring with something more heavy on bitmap scan time
>> itself. E.g.
>> SELECT SUM(l_extendedprice) FROM lineitem WHERE (l_shipdate >= '1995-01-01'::date) AND (l_shipdate <= '1996-12-31'::date);
>> or similar. The tpch queries don't actually spend that much time in the
>> bitmapscan itself - the parallization of the rest of the query is what
>> matters...
>
> Yeah, I agree.
>
> I have tested with this query, with exact filter condition it was
> taking parallel sequence scan, so I have modified the filter a bit and
> tested.
>
> Tested with all default configuration in my local machine. I think I
> will generate more such test cases and do detail testing in my
> performance machine.
>
>
> Explain Analyze results:
> ---------------------------------
> On Head:
> ------------
> postgres=# explain analyze SELECT SUM(l_extendedprice) FROM lineitem
> WHERE (l_shipdate >= '1995-01-01'::date) AND (l_shipdate <=
> '1996-03-31'::date);
>
> QUERY PLAN
> -----------------------------------------------------------------------------------------------------------------------------------------------------------
> Aggregate (cost=848805.90..848805.91 rows=1 width=32) (actual
> time=12440.165..12440.166 rows=1 loops=1)
> -> Bitmap Heap Scan on lineitem (cost=143372.40..834833.25
> rows=5589057 width=8) (actual time=1106.217..11183.722 rows=5678841
> loops=1)
> Recheck Cond: ((l_shipdate >= '1995-01-01'::date) AND
> (l_shipdate <= '1996-03-31'::date))
> Rows Removed by Index Recheck: 20678739
> Heap Blocks: exact=51196 lossy=528664
> -> Bitmap Index Scan on idx_lineitem_shipdate
> (cost=0.00..141975.13 rows=5589057 width=0) (actual
> time=1093.376..1093.376 rows=5678841 loops=1)
> Index Cond: ((l_shipdate >= '1995-01-01'::date) AND
> (l_shipdate <= '1996-03-31'::date))
> Planning time: 0.185 ms
> Execution time: 12440.819 ms
> (9 rows)
>
> After Patch:
> ---------------
> postgres=# explain analyze SELECT SUM(l_extendedprice) FROM lineitem
> WHERE (l_shipdate >= '1995-01-01'::date) AND (l_shipdate <=
> '1996-03-31'::date);
>
> QUERY PLAN
>
> --------------------------------------------------------------------------------------------------------------------------------------------------------------
> ---------
> Finalize Aggregate (cost=792751.16..792751.17 rows=1 width=32)
> (actual time=6660.157..6660.157 rows=1 loops=1)
> -> Gather (cost=792750.94..792751.15 rows=2 width=32) (actual
> time=6659.378..6660.117 rows=3 loops=1)
> Workers Planned: 2
> Workers Launched: 2
> -> Partial Aggregate (cost=791750.94..791750.95 rows=1
> width=32) (actual time=6655.941..6655.941 rows=1 loops=3)
> -> Parallel Bitmap Heap Scan on lineitem
> (cost=143372.40..785929.00 rows=2328774 width=8) (actual
> time=1980.797..6204.232 rows=1892947 loops=
> 3)
> Recheck Cond: ((l_shipdate >= '1995-01-01'::date)
> AND (l_shipdate <= '1996-03-31'::date))
> Rows Removed by Index Recheck: 6930269
> Heap Blocks: exact=17090 lossy=176443
> -> Bitmap Index Scan on idx_lineitem_shipdate
> (cost=0.00..141975.13 rows=5589057 width=0) (actual
> time=1933.454..1933.454 rows=5678841
> loops=1)
> Index Cond: ((l_shipdate >=
> '1995-01-01'::date) AND (l_shipdate <= '1996-03-31'::date))
> Planning time: 0.207 ms
> Execution time: 6669.195 ms
> (13 rows)
>
>
> Summary:
> -> With patch overall execution is 2 time faster compared to head.
> -> Bitmap creation with patch is bit slower compared to head and thats
> because of DHT vs efficient hash table.
>

I think here the impact of slowness due to Bitmap Index Scan is not
much visible, as the time it takes as compare to overall time is less.
However, I think there is an advantage of using DHT as that will allow
us to build the hash table by multiple workers using parallel index
scan in future.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavan Deolasee 2016-10-20 05:11:06 Re: FSM corruption leading to errors
Previous Message Amit Kapila 2016-10-20 03:26:40 Re: Avoiding pin scan during btree vacuum