Re: Proposal: scan key push down to heap [WIP]

From: Dilip Kumar <dilipbalaut(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Proposal: scan key push down to heap [WIP]
Date: 2016-10-29 06:47:36
Message-ID: CAFiTN-uRWHftdO7Qyqz3EtQDr0hMh-+S3o2mxTvuqng+kXYJnw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Oct 26, 2016 at 12:01 PM, Andres Freund <andres(at)anarazel(dot)de> wrote:
> The gains are quite noticeable in some cases. So if we can make it work
> without noticeable downsides...
>
> What I'm worried about though is that this, afaics, will quite
> noticeably *increase* total cost in cases with a noticeable number of
> columns and a not that selective qual. The reason for that being that
> HeapKeyTest() uses heap_getattr(), whereas upper layers use
> slot_getattr(). The latter "caches" repeated deforms, the former
> doesn't... That'll lead to deforming being essentially done twice, and
> it's quite often already a major cost of query processing.

What about putting slot reference inside HeapScanDesc ?. I know it
will make ,heap layer use executor structure but just a thought.

I have quickly hacked this way where we use slot reference in
HeapScanDesc and directly use
slot_getattr inside HeapKeyTest (only if we have valid slot otherwise
use _heap_getattr) and measure the worst case performance (what you
have mentioned above.)

My Test: (21 column table with varchar in beginning + qual is on last
few column + varying selectivity )

postgres=# \d test
Table "public.test"
Column | Type | Modifiers
--------+-------------------+-----------
f1 | integer |
f2 | character varying |
f3 | integer |
f4 | integer |
f5 | integer |
f6 | integer |
f7 | integer |
f8 | integer |
f9 | integer |
f10 | integer |
f11 | integer |
f12 | integer |
f13 | integer |
f14 | integer |
f15 | integer |
f16 | integer |
f17 | integer |
f18 | integer |
f19 | integer |
f20 | integer |
f21 | integer |

tuple count : 10000000 (10 Million)
explain analyze select * from test where f21< $1 and f20 < $1 and f19
< $1 and f15 < $1 and f10 < $1; ($1 vary from 1Million to 1Million).

Target code base:
-----------------------
1. Head
2. Heap_scankey_pushdown_v1
3. My hack for keeping slot reference in HeapScanDesc
(v1+use_slot_in_HeapKeyTest)

Result:
Selectivity Head scan_key_pushdown_v1 v1+use_slot_in_HeapKeyTest
0.1 3880 2980 2747
0.2 4041 3187 2914
0.5 5051 4921 3626
0.8 5378 7296 3879
1.0 6161 8525 4575

Performance graph is attached in the mail..

Observation:
----------------
1. Heap_scankey_pushdown_v1, start degrading after very high
selectivity (this behaviour is only visible if table have 20 or more
columns, I tested with 10 columns but with that I did not see any
regression in v1).

2. (v1+use_slot_in_HeapKeyTest) is always winner, even at very high selectivity.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Attachment Content-Type Size
image/png 26.9 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Karl O. Pinc 2016-10-29 12:38:26 Re: Patch to implement pg_current_logfile() function
Previous Message Merlin Moncure 2016-10-28 21:17:36 Re: emergency outage requiring database restart