Re: Reduce amount of WAL generated by CREATE INDEX for gist, gin and sp-gist

From: Andrey Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>
To: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, David Steele <david(at)pgmasters(dot)net>, a(dot)lubennikova(at)postgrespro(dot)ru
Cc: Michael Paquier <michael(at)paquier(dot)xyz>, Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Reduce amount of WAL generated by CREATE INDEX for gist, gin and sp-gist
Date: 2019-03-26 09:29:03
Message-ID: 670845ba-91a5-0bc5-9cf6-6b0e8ba287cf@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 25/03/2019 15:21, Heikki Linnakangas wrote:
> On 25/03/2019 09:57, David Steele wrote:
>> On 2/6/19 2:08 PM, Andrey Lepikhov wrote:
>>> The patchset had a problem with all-zero pages, has appeared at index
>>> build stage: the generic_log_relation() routine sends all pages into the
>>> WAL. So  lsn field at all-zero page was initialized and the
>>> PageIsVerified() routine detects it as a bad page.
>>> The solution may be:
>>> 1. To improve index build algorithms and eliminate the possibility of
>>> not used pages appearing.
>>> 2. To mark each page as 'dirty' right after initialization. In this case
>>> we will got 'empty' page instead of the all-zeroed.
>>> 3. Do not write into the WAL all-zero pages.
>
> Hmm. When do we create all-zero pages during index build? That seems
> pretty surprising.
GIST uses buffered pages. During GIST build it is possible (very rarely)
what no one index tuple was written to the block page before new block
was allocated. And the page has become an all-zero page.
You can't have problems in the current GIST code, because it writes into
the WAL only changed pages.
But the idea of the patch is traversing blocks of index relation
one-by-one after end of index building process and write this blocks to
the WAL. In this case we will see all-zeroed pages and need to check it.
>
>>> On 04.02.2019 10:04, Michael Paquier wrote:
>>>> On Tue, Dec 18, 2018 at 10:41:48AM +0500, Andrey Lepikhov wrote:
>>>>> Ok. It is used only for demonstration.
>>>>
>>>> The latest patch set needs a rebase, so moved to next CF, waiting on
>>>> author as this got no reviews.
>>
>> The patch no longer applies so marked Waiting on Author.
>>
>> Alexander, Heikki, are either of you planning to review the patch in
>> this CF?
>
> I had another quick look.
>
> I still think using the "generic xlog AM" for this is a wrong level of
> abstraction, and we should use the XLOG_FPI records for this directly.
> We can extend XLOG_FPI so that it can store multiple pages in a single
> record, if it doesn't already handle it.
>
> Another counter-point to using the generic xlog record is that you're
> currently doing unnecessary two memcpy's of all pages in the index, in
> GenericXLogRegisterBuffer() and GenericXLogFinish(). That's not free.
>
> I guess the generic_log_relation() function can stay where it is, but it
> should use XLogRegisterBuffer() and XLogInsert() directly.
Ok. This patch waited feedback for a long time. I will check the GIST
code changes from previous review and will try to use your advice.
>
> - Heikki

--
Andrey Lepikhov
Postgres Professional
https://postgrespro.com
The Russian Postgres Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2019-03-26 09:30:39 Re: psql display of foreign keys
Previous Message Pavan Deolasee 2019-03-26 09:26:31 Re: COPY FREEZE and setting PD_ALL_VISIBLE/visibility map bits