Re: Pluggable Storage - Andres's take

From: Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Pluggable Storage - Andres's take
Date: 2018-09-21 06:57:43
Message-ID: CAJrrPGfQfiNE6Saw1edfCBZ5advfv=YxTwDRWJ4hUPZScvGmYA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Sep 10, 2018 at 5:42 PM Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>
wrote:

> On Wed, Sep 5, 2018 at 2:04 PM Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>
> wrote:
>
>>
>> On Tue, Sep 4, 2018 at 10:33 AM Andres Freund <andres(at)anarazel(dot)de> wrote:
>>
>>> Hi,
>>>
>>> Thanks for the patches!
>>>
>>> On 2018-09-03 19:06:27 +1000, Haribabu Kommi wrote:
>>> > I found couple of places where the zheap is using some extra logic in
>>> > verifying
>>> > whether it is zheap AM or not, based on that it used to took some extra
>>> > decisions.
>>> > I am analyzing all the extra code that is done, whether any callbacks
>>> can
>>> > handle it
>>> > or not? and how? I can come back with more details later.
>>>
>>> Yea, I think some of them will need to stay (particularly around
>>> integrating undo) and som other ones we'll need to abstract.
>>>
>>
>> OK. I will list all the areas that I found with my observation of how to
>> abstract or leaving it and then implement around it.
>>
>
> The following are the change where the code is specific to checking whether
> it is a zheap relation or not?
>
> Overall I found that It needs 3 new API's at the following locations.
> 1. RelationSetNewRelfilenode
> 2. heap_create_init_fork
> 3. estimate_rel_size
> 4. Facility to provide handler options like (skip WAL and etc).
>

During the porting of Fujitsu in-memory columnar store on top of pluggable
storage, I found that the callers of the "heap_beginscan" are expecting
the returned data is always contains all the records.

For example, in the sequential scan, the heap returns the slot with
the tuple or with value array of all the columns and then the data gets
filtered and later removed the unnecessary columns with projection.
This works fine for the row based storage. For columnar storage, if
the storage knows that upper layers needs only particular columns,
then they can directly return the specified columns and there is no
need of projection step. This will help the columnar storage also
to return proper columns in a faster way.

Is it good to pass the plan to the storage, so that they can find out
the columns that needs to be returned? And also if the projection
can handle in the storage itself for some scenarios, need to be
informed the callers that there is no need to perform the projection
extra.

comments?

Regards,
Haribabu Kommi
Fujitsu Australia

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2018-09-21 07:05:34 Re: Pluggable Storage - Andres's take
Previous Message Tsunakawa, Takayuki 2018-09-21 06:26:19 RE: Changing the setting of wal_sender_timeout per standby