Re: FSM versus GIN pending list bloat

From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: FSM versus GIN pending list bloat
Date: 2015-08-04 13:35:08
Message-ID: CANP8+jKUFEY0T2ZUZXW-TjkS3zYRrFoKbxN_-mMXFF+_oFAvjQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 4 August 2015 at 09:39, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:

> On 4 August 2015 at 06:03, Jeff Janes <jeff(dot)janes(at)gmail(dot)com> wrote:
>
>
>> The attached proof of concept patch greatly improves the bloat for both
>> the insert and the update cases. You need to turn on both features: adding
>> the pages to fsm, and vacuuming the fsm, to get the benefit (so JJ_GIN=3).
>> The first of those two things could probably be adopted for real, but the
>> second probably is not acceptable. What is the right way to do this?
>> Could a variant of RecordFreeIndexPage bubble the free space up the map
>> immediately rather than waiting for a vacuum? It would only have to move
>> up until it found a page with freespace already recorded in it, which the
>> vast majority of the time would mean observing up one level and then not
>> writing to it, assuming the pending list pages remain well clustered.
>>
>
> You make a good case for action here since insert only tables with GIN
> indexes on text are a common use case for GIN.
>
> Why would vacuuming the FSM be unacceptable? With a
> large gin_pending_list_limit it makes sense.
>
> If it is unacceptable, perhaps we can avoid calling it every time, or
> simply have FreeSpaceMapVacuum() terminate more quickly on some kind of
> 80/20 heuristic for this case.
>

Couple of questions here...

* the docs say "it's desirable to have pending-list cleanup occur in the
background", but there is no way to invoke that, except via VACUUM. I think
we need a separate function to be able to call this as a background action.
If we had that, we wouldn't need much else, would we?

* why do we have two parameters: gin_pending_list_limit and fastupdate?
What happens if we set gin_pending_list_limit but don't set fastupdate?

* how do we know how to set that parameter? Is there a way of knowing
gin_pending_list_limit has been reached?

This and the OP seem like 9.5 open items to me.

--
Simon Riggs http://www.2ndQuadrant.com/
<http://www.2ndquadrant.com/>
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2015-08-04 13:49:58 Re: patch: prevent user from setting wal_buffers over 2GB bytes
Previous Message Ildus Kurbangaliev 2015-08-04 13:03:53 Re: RFC: replace pg_stat_activity.waiting with something more descriptive