Re: TABLESAMPLE patch

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Petr Jelinek <petr(at)2ndquadrant(dot)com>
Cc: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Jaime Casanova <jaime(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Tomas Vondra <tv(at)fuzzy(dot)cz>
Subject: Re: TABLESAMPLE patch
Date: 2015-03-09 03:51:41
Message-ID: CAA4eK1JkB_KNvV8G14ek7pEMN=hTMXFPOqrf_Hae-L3OzvH5Qg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Mar 7, 2015 at 10:37 PM, Petr Jelinek <petr(at)2ndquadrant(dot)com> wrote:
>
> On 05/03/15 09:21, Amit Kapila wrote:
>>
>> On Tue, Feb 17, 2015 at 3:29 AM, Petr Jelinek <petr(at)2ndquadrant(dot)com
>> <mailto:petr(at)2ndquadrant(dot)com>> wrote:
>> >
>> >
>> > I didn't add the whole page visibility caching as the tuple ids we
>> get from sampling methods don't map well to the visibility info we get
>> from heapgetpage (it maps to the values in the rs_vistuples array not to
>> to its indexes). Commented about it in code also.
>> >
>>
>> I think we should set pagemode for system sampling as it can
>> have dual benefit, one is it will allow us caching tuples and other
>> is it can allow us pruning of page which is done in heapgetpage().
>> Do you see any downside to it?
>
>
> Double checking for tuple visibility is the only downside I can think of.

That will happen if we use heapgetpage and the way currently
code is written in patch, however we can easily avoid double
checking if we don't call heapgetpage and rather do the required
work at caller's place.

> Plus some added code complexity of course. I guess we could use binary
search on rs_vistuples (it's already sorted) so that info won't be
completely useless. Not sure if worth it compared to normal visibility
check, but not hard to do.
>

It seems to me that it is better to avoid locking/unlocking buffer
wherever possible.

> I personally don't see the page pruning in TABLESAMPLE as all that
important though, considering that in practice it will only scan tiny
portion of a relation and usually one that does not get many updates (in
practice the main use-case is BI).
>

In that case, I think it should be acceptable either way, because
if there are less updates then anyway it won't incur any cost of
doing the pruning.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Fujii Masao 2015-03-09 05:33:28 Re: [REVIEW] Re: Compression of full-page-writes
Previous Message Ashutosh Bapat 2015-03-09 03:49:08 Re: Join push-down support for foreign tables