Re: HEAD seems to generate larger WAL regarding GIN index

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Jesper Krogh <jesper(at)krogh(dot)cc>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: HEAD seems to generate larger WAL regarding GIN index
Date: 2014-03-31 16:41:44
Message-ID: CA+Tgmoa2Ny=0AeJe5YRuZ5PSKna0qJSDx3_BE41Q65EYypsiJQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Mar 20, 2014 at 1:12 PM, Jesper Krogh <jesper(at)krogh(dot)cc> wrote:
> On 15/03/14 20:27, Heikki Linnakangas wrote:
>> That said, I didn't expect the difference to be quite that big when you're
>> appending to the end of the table. When the new entries go to the end of the
>> posting lists, you only need to recompress and WAL-log the last posting
>> list, which is max 256 bytes long. But I guess that's still a lot more WAL
>> than in the old format.
>>
>> That could be optimized, but I figured we can live with it, thanks to the
>> fastupdate feature. Fastupdate allows amortizing that cost over several
>> insertions. But of course, you explicitly disabled that...
>
> In a concurrent update environment, fastupdate as it is in 9.2 is not really
> useful. It may be that you can bulk up insertion, but you have no control
> over who ends up paying the debt. Doubling the amount of wal from
> gin-indexing would be pretty tough for us, in 9.2 we generate roughly 1TB
> wal / day, keeping it
> for some weeks to be able to do PITR. The wal are mainly due to gin-index
> updates as new data is added and needs to be searchable by users. We do run
> gzip that cuts it down to 25-30% before keeping the for too long, but
> doubling this is going to be a migration challenge.
>
> If fast-update could be made to work in an environment where we both have
> users searching the index and manually updating it and 4+ backend processes
> updating the index concurrently then it would be a good benefit to gain.
>
> the gin index currently contains 70+ million records with and average
> tsvector of 124 terms.

Should we try to install some hack around fastupdate for 9.4? I fear
the divergence between reasonable values of work_mem and reasonable
sizes for that list is only going to continue to get bigger. I'm sure
there's somebody out there who has work_mem = 16GB, and stuff like
263865a48973767ce8ed7b7788059a38a24a9f37 is only going to increase the
appeal of large values.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Johnston 2014-03-31 16:44:21 Re: PQputCopyData dont signal error
Previous Message Robert Haas 2014-03-31 16:34:39 Re: jsonb and nested hstore