Re: GiST VACUUM

From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Cc: Andrey Borodin <x4mmm(at)yandex-team(dot)ru>, Michael Paquier <michael(at)paquier(dot)xyz>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Костя Кузнецов <chapaev28(at)ya(dot)ru>
Subject: Re: GiST VACUUM
Date: 2019-03-15 19:20:06
Message-ID: CAMkU=1wOqJgjXMfO_R_rUjZ6dvba3x-xeCPBLYQZ1pTV1utLow@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Mar 5, 2019 at 8:21 AM Heikki Linnakangas <hlinnaka(at)iki(dot)fi> wrote:

> On 05/03/2019 02:26, Andrey Borodin wrote:
> >> I also tried your amcheck tool with this. It did not report any
> >> errors.
> >>
> >> Attached is also latest version of the patch itself. It is the
> >> same as your latest patch v19, except for some tiny comment
> >> kibitzing. I'll mark this as Ready for Committer in the commitfest
> >> app, and will try to commit it in the next couple of days.
> >
> > That's cool! I'll work on 2nd step of these patchset to make
> > blockset data structure prettier and less hacky.
>
> Committed the first patch. Thanks for the patch!
>

Thank you. This is a transformational change; it will allow GiST indexes
larger than RAM to be used in some cases where they were simply not
feasible to use before. On a HDD, it resulted in a 50 fold improvement in
vacuum time, and the machine went from unusably unresponsive to merely
sluggish during the vacuum. On a SSD (albeit a very cheap laptop one, and
exposed from Windows host to Ubuntu over VM Virtual Box) it is still a 30
fold improvement, from a far faster baseline. Even on an AWS instance with
a "GP2" SSD volume, which normally shows little benefit from sequential
reads, I get a 3 fold speed up.

I also ran this through a lot of crash-recovery testing using simulated
torn-page writes using my traditional testing harness with high concurrency
(AWS c4.4xlarge and a1.4xlarge using 32 concurrent update processes) and
did not encounter any problems. I tested both with btree_gist on a scalar
int, and on tsvector with each tsvector having 101 tokens.

I did notice that the space freed up in the index by vacuum doesn't seem to
get re-used very efficiently, but that is an ancestral problem independent
of this change.

Cheers,

Jeff

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Chapman Flack 2019-03-15 19:26:50 Re: Facing issue in using special characters
Previous Message Robert Haas 2019-03-15 18:39:54 Re: hyrax vs. RelationBuildPartitionDesc