FSM versus GIN pending list bloat

From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: FSM versus GIN pending list bloat
Date: 2015-08-04 05:03:00
Message-ID: CAMkU=1xfE1MnGMkv655hB8jCs3PBTb4S5H+FnQv8kcmYzyeBDQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

For a GIN index with fastupdate turned on, both the user backends and
autoanalyze routine will clear out the pending list, pushing the entries
into the normal index structure and deleting the pages used by the pending
list. But those deleted pages will not get added to the freespace map
until a vacuum is done. This leads to horrible bloat on insert only
tables, as it is never vacuumed and so the pending list space is never
reused. And the pending list is very inefficient in space usage to start
with, even compared to the old style posting lists and especially compared
to the new compressed ones. (If they were aggressively recycled, this
inefficient use wouldn't be much of a problem.)

Even on a table receiving mostly updates after its initial population (and
so being vacuumed regularly) with default autovac setting, there is a lot
of bloat.

The attached proof of concept patch greatly improves the bloat for both the
insert and the update cases. You need to turn on both features: adding the
pages to fsm, and vacuuming the fsm, to get the benefit (so JJ_GIN=3). The
first of those two things could probably be adopted for real, but the
second probably is not acceptable. What is the right way to do this?
Could a variant of RecordFreeIndexPage bubble the free space up the map
immediately rather than waiting for a vacuum? It would only have to move
up until it found a page with freespace already recorded in it, which the
vast majority of the time would mean observing up one level and then not
writing to it, assuming the pending list pages remain well clustered.

Or would a completely different approach be better, like managing the
vacated pending list pages directly in the index without going to the fsm?

Cheers,

Jeff

Attachment Content-Type Size
gin_fast_freespace.patch application/octet-stream 2.3 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2015-08-04 05:21:16 Re: pg_rewind tap test unstable
Previous Message Michael Paquier 2015-08-04 04:56:56 tablecmds.c and lock hierarchy