Re: Reduce amount of WAL generated by CREATE INDEX for gist, gin and sp-gist

From: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
To: David Steele <david(at)pgmasters(dot)net>, Andrey Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>, a(dot)lubennikova(at)postgrespro(dot)ru
Cc: Michael Paquier <michael(at)paquier(dot)xyz>, Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Reduce amount of WAL generated by CREATE INDEX for gist, gin and sp-gist
Date: 2019-03-25 10:21:29
Message-ID: b7486071-1b2b-11c6-4412-97dd1f42481d@iki.fi
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 25/03/2019 09:57, David Steele wrote:
> On 2/6/19 2:08 PM, Andrey Lepikhov wrote:
>> The patchset had a problem with all-zero pages, has appeared at index
>> build stage: the generic_log_relation() routine sends all pages into the
>> WAL. So lsn field at all-zero page was initialized and the
>> PageIsVerified() routine detects it as a bad page.
>> The solution may be:
>> 1. To improve index build algorithms and eliminate the possibility of
>> not used pages appearing.
>> 2. To mark each page as 'dirty' right after initialization. In this case
>> we will got 'empty' page instead of the all-zeroed.
>> 3. Do not write into the WAL all-zero pages.

Hmm. When do we create all-zero pages during index build? That seems
pretty surprising.

>> On 04.02.2019 10:04, Michael Paquier wrote:
>>> On Tue, Dec 18, 2018 at 10:41:48AM +0500, Andrey Lepikhov wrote:
>>>> Ok. It is used only for demonstration.
>>>
>>> The latest patch set needs a rebase, so moved to next CF, waiting on
>>> author as this got no reviews.
>
> The patch no longer applies so marked Waiting on Author.
>
> Alexander, Heikki, are either of you planning to review the patch in
> this CF?

I had another quick look.

I still think using the "generic xlog AM" for this is a wrong level of
abstraction, and we should use the XLOG_FPI records for this directly.
We can extend XLOG_FPI so that it can store multiple pages in a single
record, if it doesn't already handle it.

Another counter-point to using the generic xlog record is that you're
currently doing unnecessary two memcpy's of all pages in the index, in
GenericXLogRegisterBuffer() and GenericXLogFinish(). That's not free.

I guess the generic_log_relation() function can stay where it is, but it
should use XLogRegisterBuffer() and XLogInsert() directly.

- Heikki

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Steele 2019-03-25 10:38:56 Re: libpq compression
Previous Message Fabien COELHO 2019-03-25 09:29:46 Re: reorder pg_rewind control file sync