Re: HEAD seems to generate larger WAL regarding GIN index

From: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
To: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Cc: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: HEAD seems to generate larger WAL regarding GIN index
Date: 2014-03-15 22:15:14
Message-ID: CAPpHfdv+S_aOUHMop=FKtPPekB5vVgeoG4jFbD0Z594VTpEy8w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Mar 15, 2014 at 11:27 PM, Heikki Linnakangas <
hlinnakangas(at)vmware(dot)com> wrote:

> On 03/15/2014 08:40 PM, Fujii Masao wrote:
>
>> Hi,
>>
>> I executed the following statements in HEAD and 9.3, and compared
>> the size of WAL which were generated by data insertion in GIN index.
>>
>> ---------------------
>> CREATE EXTENSION pg_trgm;
>> CREATE TABLE hoge (col1 text);
>> CREATE INDEX hogeidx ON hoge USING gin (col1 gin_trgm_ops) WITH
>> (FASTUPDATE = off);
>>
>> CHECKPOINT;
>> SELECT pg_switch_xlog();
>> SELECT pg_switch_xlog();
>>
>> SELECT pg_current_xlog_location();
>> INSERT INTO hoge SELECT 'POSTGRESQL' FROM generate_series(1, 1000000);
>> SELECT pg_current_xlog_location();
>> ---------------------
>>
>> The results of WAL size are
>>
>> 960 MB (9.3)
>> 2113 MB (HEAD)
>>
>> The WAL size in HEAD was more than two times bigger than that in 9.3.
>> Recently the source code of GIN index has been changed dramatically.
>> Is the increase in GIN-related WAL intentional or a bug?
>>
>
> It was somewhat expected. Updating individual items on the new-format GIN
> pages requires decompressing and recompressing the page, and the
> recompressed posting lists need to be WAL-logged. Which generates much
> larger WAL records.
>
> That said, I didn't expect the difference to be quite that big when you're
> appending to the end of the table. When the new entries go to the end of
> the posting lists, you only need to recompress and WAL-log the last posting
> list, which is max 256 bytes long. But I guess that's still a lot more WAL
> than in the old format.
>
> That could be optimized, but I figured we can live with it, thanks to the
> fastupdate feature. Fastupdate allows amortizing that cost over several
> insertions. But of course, you explicitly disabled that...

Let me know if you want me to write patch addressing this issue.

------
With best regards,
Alexander Korotkov.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2014-03-16 00:55:03 Minimum supported version of Python?
Previous Message Sergey Burladyan 2014-03-15 20:57:20 Re: BUG #9223: plperlu result memory leak