Re: Avoiding smgrimmedsync() during nbtree index builds

From: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
To: Melanie Plageman <melanieplageman(at)gmail(dot)com>, Justin Pryzby <pryzby(at)telsasoft(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Peter Geoghegan <pg(at)bowt(dot)ie>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Dmitry Dolgov <9erthalion6(at)gmail(dot)com>
Subject: Re: Avoiding smgrimmedsync() during nbtree index builds
Date: 2022-07-23 09:34:54
Message-ID: d410c275-7181-52ce-63ae-0af229b99a70@iki.fi
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 05/03/2022 00:03, Melanie Plageman wrote:
> On Wed, Mar 2, 2022 at 8:09 PM Justin Pryzby <pryzby(at)telsasoft(dot)com> wrote:
>>
>> Rebased to appease cfbot.
>>
>> I ran these paches under a branch which shows code coverage in cirrus. It
>> looks good to my eyes.
>> https://api.cirrus-ci.com/v1/artifact/task/5212346552418304/coverage/coverage/00-index.html
>
> thanks for doing that and for the rebase! since I made updates, the
> attached version 6 is also rebased.

I'm surprised by all the changes in nbtsort.c to choose between using
the buffer manager or the new API. I would've expected the new API to
abstract that away. Otherwise we need to copy that logic to all the
index AMs.

I'd suggest an API along the lines of:

/*
* Start building a relation in bulk.
*
* If the relation is going to be small, we will use the buffer manager,
* but if it's going to be large, this will skip the buffer manager and
* write the pages directly to disk.
*/
bulk_init(SmgrRelation smgr, BlockNumber estimated_size)

/*
* Extend relation by one page
*/
bulk_extend(SmgrRelation, BlockNumber, Page)

/*
* Finish building the relation
*
* This will fsync() the data to disk, if required.
*/
bulk_finish()

Behind this interface, you can encapsulate the logic to choose whether
to use the buffer manager or not. And I think bulk_extend() could do the
WAL-logging too.

Or you could make the interface look more like the buffer manager:

/* as above */
bulk_init(SmgrRelation smgr, BlockNumber estimated_size)
bulk_finish()

/*
* Get a buffer for writing out a page.
*
* The contents of the buffer are uninitialized. The caller
* must fill it in before releasing it.
*/
BulkBuffer
bulk_getbuf(SmgrRelation smgr, BlockNumber blkno)

/*
* Release buffer. It will be WAL-logged and written out to disk.
* Not necessarily immediately, but at bulk_finish() at latest.
*
* NOTE: There is no way to read back the page after you release
* it, until you finish the build with bulk_finish().
*/
void
bulk_releasebuf(SmgrRelation smgr, BulkBuffer buf)

With this interface, you could also batch multiple pages and WAL-log
them all in one WAL record with log_newpage_range(), for example.

- Heikki

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrey Borodin 2022-07-23 09:40:44 Re: Amcheck verification of GiST and GIN
Previous Message Bharath Rupireddy 2022-07-23 09:28:53 A proposal for shared memory based backup infrastructure