Re: Relation extension scalability

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Dilip Kumar <dilipbalaut(at)gmail(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andres Freund <andres(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Relation extension scalability
Date: 2016-01-23 06:49:43
Message-ID: CAA4eK1+RCbb9E8UWPjK6ge1tgL8P4jbLe2bg4D8pA-L3p7Cxsw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Jan 12, 2016 at 2:41 PM, Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:

> On Thu, Jan 7, 2016 at 4:53 PM, Andres Freund <andres(at)anarazel(dot)de> wrote:
>
>> On 2016-01-07 16:48:53 +0530, Amit Kapila wrote:
>>
>> I think it's a worthwhile approach to pursue. But until it actually
>> fixes the problem of leaving around uninitialized pages I don't think
>> it's very meaningful to do performance comparisons.
>>
>
> Attached patch solves this issue, I am allocating the buffer for each page
> and initializing the page, only after that adding to FSM.
>

Few comments about patch:

1.
Patch is not getting compiled.

1>src/backend/access/heap/hio.c(480): error C2065: 'buf' : undeclared
identifier
1>src/backend/access/heap/hio.c(480): error C2065: 'buf' : undeclared
identifier
1>src/backend/access/heap/hio.c(480): error C2065: 'buf' : undeclared
identifier

2.
! page = BufferGetPage(buffer);
! PageInit(page, BufferGetPageSize
(buf), 0);
!
! freespace = PageGetHeapFreeSpace(page);
!
MarkBufferDirty(buffer);
! UnlockReleaseBuffer(buffer);
!
RecordPageWithFreeSpace(relation, BufferGetBlockNumber(buffer), freespace);

What is the need to mark page dirty here, won't it automatically
be markerd dirty once the page is used? I think it is required
if you wish to WAL-log this action.

3. I think you don't need to multi-extend a relation if
HEAP_INSERT_SKIP_FSM is used, as for that case it anyways try to
get a new page by extending a relation.

4. Again why do you need this multi-extend optimization for local
relations (those only accessible to current backend)?

5. Do we need this for nbtree as well? One way to check that
is by Copying large data in table having index.

> Note: Test with both data and WAL on Magnetic Disk : No significant
> improvement visible
> -- I think wall write is becoming bottleneck in this case.
>
>
In that case, can you try the same test with un-logged tables?

Also, it is good to check the performance of patch with read-write work
load to ensure that extending relation in multiple-chunks should not
regress such cases.

> Currently i have kept extend_num_page as session level parameter but i
> think later we can make this as table property.
> Any suggestion on this ?
>
>
I think we should have a new storage_parameter at table level
extend_by_blocks or something like that instead of GUC. The
default value of this parameter should be 1 which means retain
current behaviour by default.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2016-01-23 07:13:14 Re: insert/update performance
Previous Message Jinhua Luo 2016-01-23 06:43:40 insert/update performance