Re: Proposal: scan key push down to heap [WIP]

From: Dilip Kumar <dilipbalaut(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Kouhei Kaigai <kaigai(at)ak(dot)jp(dot)nec(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Proposal: scan key push down to heap [WIP]
Date: 2016-11-13 05:16:28
Message-ID: CAFiTN-uaWpXHT-CZGgwudhsrfrKT1nr4TUCVRZKtyE9PLuL8_Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I have done performance analysis for TPCH queries, I saw visible gain
in 5 queries (10-25%).

Performance Data:

Benchmark : TPCH (S.F. 10)
shared_buffer : 20GB
work_mem : 50MB
Machine : POWER

Results are median of three run (explain analyze results for both
head/patch are attached in TPCH_out.tar).

Query Execution Time in (ms)
Head Patch Improvement
Q3 18475 16558 10%
Q4 7526 5856 22%
Q7 19386 17425 10%
Q10 16994 15019 11%
Q12 13852 10117 26%

Currently we had two major problems about this patch..

Problem1: As Andres has mentioned, HeapKeyTest uses heap_getattr,
whereas ExecQual use slot_getattr().So we can have worst case
performance problem when very less tuple are getting filter out and we
have table with many columns with qual on most of the columns.

Problem2. In HeapKeyTest we are under per_query_ctx, whereas in
ExecQual we are under per_tuple_ctx , so in former we can not afford
to have any palloc.

In this patch I have address both the concern by exposing executor
information to heap (I exposed per_tuple_ctx and slot to HeapDesc),
which is not a very good design.

I have other ideas in mind for solving these concerns, please provide
your thoughts..

For problem1 :
I think it's better to give task of key push down to optimizer, there
we can actually take the decision mostly based on two parameters.
1. Selectivity.
2. Column number on which qual is given.

For problem2 :
I think for solving this we need to limit the number of datatype we
pushdown to heap (I mean we can push all datatype which don't need
palloc in qual test).
1. Don't push down datatype with variable length.
2. Some other datatype with fixed length like 'name' can also
palloc (i.e nameregexeq). so we need to block them as well.

*Note: For exactly understanding which key is pushed down in below
attached exact analyze output, refer this example..

without patch:
-> Parallel Seq Scan on orders (cost=0.00..369972.00 rows=225038
width=20) (actual time=0.025..3216.157 rows=187204 loops=3)
Filter: ((o_orderdate >= '1995-01-01'::date) AND
(o_orderdate < '1995-04-01 00:00:00'::timestamp without time zone))
Rows Removed by Filter: 4812796

with patch:
-> Parallel Seq Scan on orders (cost=0.00..369972.00 rows=225038
width=20) (actual time=0.015..1884.993 rows=187204 loops=3)
Filter: ((o_orderdate >= '1995-01-01'::date) AND
(o_orderdate < '1995-04-01 00:00:00'::timestamp without time zone))

1. So basically on head it shows how many rows are discarded by filter
(Rows Removed by Filter: 4812796), Now if we pushdown all the keys
then it will not show this value.

2. We can also check how much actual time reduced in the SeqScan node.
(i.e in above example on head it was 3216.157 whereas with patch it
was 1884.993).

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

On Thu, Nov 3, 2016 at 7:29 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Tue, Nov 1, 2016 at 8:31 PM, Kouhei Kaigai <kaigai(at)ak(dot)jp(dot)nec(dot)com> wrote:
>> By the way, I'm a bit skeptical whether this enhancement is really beneficial
>> than works for this enhancement, because we can now easily increase the number
>> of processor cores to run seq-scan with qualifier, especially, when it has high
>> selectivity.
>> How about your thought?
>
> Are you saying we don't need to both making sequential scans faster
> because we could just use parallel sequential scan instead? That
> doesn't sound right to me.
>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Attachment Content-Type Size
heap_scankey_pushdown_expr_ctx.patch application/octet-stream 12.2 KB
TPCH_out.tar application/x-tar 60.0 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2016-11-13 05:20:22 Re: Do we need use more meaningful variables to replace 0 in catalog head files?
Previous Message Petr Jelinek 2016-11-13 04:16:56 Re: Logical Replication WIP