Re: Fillfactor for GIN indexes

From: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Fillfactor for GIN indexes
Date: 2015-01-17 09:49:03
Message-ID: CAPpHfdvb_MsCa8Qtcj_+APDiDBE-von3AN3LeV9yrj_v9VCZYQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Jan 16, 2015 at 8:40 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:

> On Thu, Jan 15, 2015 at 7:06 AM, Michael Paquier
> <michael(dot)paquier(at)gmail(dot)com> wrote:
> > Alexander Korotkov wrote:
> >> I'm not sure. On the one hand it's unclear why fillfactor should be
> >> different from 9.4.
> >> On the other hand it's unclear why it should be different from btree.
> >> I propose marking this "ready for committer". So, committer can make a
> final
> >> decision.
> > OK let's do so then. My preference is to fully pack the index at
> > build. GIN compression has been one of the headlines of 9.4.
>
> I'm struggling to understand why we shouldn't just reject this patch.
> On November 27th, Cedric said:
>
> "what are the benefits of this patch ? (maybe you had some test case
> or a benchmark ?)"
>
> Nobody replied. On January 15th, you (Michael) hypothesized that
> "this patch has value to control random updates on GIN indexes" but
> there seem to be absolutely no test results showing that any such
> value exists.
>
> There's only value in adding a fillfactor parameter to GIN indexes if
> it improves performance. There are no benchmarks showing it does.
> So, why are we still talking about this?

I already wrote quite detailed explanation of subject. Let mel try to
explain in shortly. GIN is two level nested btree. Thus, GIN would have
absolutely same benefits from fillfactor as btree. Lack of tests showing it
is, for sure, fault.

However, GIN posting trees are ordered by ItemPointer and this makes some
specific. If you have freshly created table and do inserts/updates they
would use the end of heap. Thus, inserts would go to the end of GIN posting
tree and fillfactor wouldn't affect anything. Fillfactor would give
benefits on HOT or heap space re-usage.

In the following example you can see that index size was increased greatly
while updating every 20th row. It's because every update causes page split
in index.

# create table test with (fillfactor=90) as (select id, array[1,2,3] v from
generate_series(1,1000000) id);
# create index test_idx100 on test using gin(v) with (fillfactor=100,
fastupdate=off);
# \di+
List of relations
Schema │ Name │ Type │ Owner │ Table │ Size │ Description
────────┼─────────────┼───────┼────────┼───────┼─────────┼─────────────
public │ test_idx100 │ index │ smagen │ test │ *3184 kB* │

# update test set v = array[1,2] where id%20 = 0;
# \di+
List of relations
Schema │ Name │ Type │ Owner │ Table │ Size │ Description
────────┼─────────────┼───────┼────────┼───────┼─────────┼─────────────
public │ test_idx100 │ index │ smagen │ test │ *5264 kB* │
(1 row)

But if we create index with fillfactor=90, index size would remain the
same: no page splits.

# create table test with (fillfactor=90) as (select id, array[1,2,3] v from
generate_series(1,1000000) id);
# create index test_idx90 on test using gin(v) with (fillfactor=90,
fastupdate=off);
# \di+
List of relations
Schema │ Name │ Type │ Owner │ Table │ Size │ Description
────────┼────────────┼───────┼────────┼───────┼─────────┼─────────────
public │ test_idx90 │ index │ smagen │ test │ *3520 kB* │

# update test set v = array[1,2] where id%20 = 0;
# \di+
List of relations
Schema │ Name │ Type │ Owner │ Table │ Size │ Description
────────┼────────────┼───────┼────────┼───────┼─────────┼─────────────
public │ test_idx90 │ index │ smagen │ test │ *3520 kB* │
(1 row)

Similar situation would be if we use fastupdate. But fastupdate takes some
space for pending lists which is independent from fillfactor.

# create table test with (fillfactor=90) as (select id, array[1,2,3] v from
generate_series(1,1000000) id);
# create index test_idx100 on test using gin(v) with (fillfactor=100,
fastupdate=on);
# \di+
List of relations
Schema │ Name │ Type │ Owner │ Table │ Size │ Description
────────┼─────────────┼───────┼────────┼───────┼─────────┼─────────────
public │ test_idx100 │ index │ smagen │ test │ *3184 kB* │

# update test set v = array[1,2] where id%20 = 0;
# vacuum test;
# \di+
List of relations
Schema │ Name │ Type │ Owner │ Table │ Size │ Description
────────┼─────────────┼───────┼────────┼───────┼─────────┼─────────────
public │ test_idx100 │ index │ smagen │ test │ *7256 kB* │

# create table test with (fillfactor=90) as (select id, array[1,2,3] v from
generate_series(1,1000000) id);
# create index test_idx100 on test using gin(v) with (fillfactor=90,
fastupdate=on);
# \di+
List of relations
Schema │ Name │ Type │ Owner │ Table │ Size │ Description
────────┼─────────────┼───────┼────────┼───────┼─────────┼─────────────
public │ test_idx100 │ index │ smagen │ test │ *3520 kB* │

# update test set v = array[1,2] where id%20 = 0;
# vacuum test;
# \di+
List of relations
Schema │ Name │ Type │ Owner │ Table │ Size │ Description
────────┼─────────────┼───────┼────────┼───────┼─────────┼─────────────
public │ test_idx100 │ index │ smagen │ test │ *5512 kB* │

BTW, previous version of patch contained some bugs. Revised version is
attached.

------
With best regards,
Alexander Korotkov.

Attachment Content-Type Size
gin_fillfactor_4.patch application/octet-stream 8.7 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alexander Korotkov 2015-01-17 11:21:01 Re: Fillfactor for GIN indexes
Previous Message Pavel Stehule 2015-01-17 06:32:05 Re: proposal: lock_time for pg_stat_database