Re: Index Bloat Problem

From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: Strahinja Kustudić <strahinjak(at)nordeus(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Index Bloat Problem
Date: 2012-08-18 02:33:38
Message-ID: CAMkU=1ysTfuS+7DkvmhASN4rzzTJLGxkb4QKd-r6gFV6jkwwBQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Thu, Aug 16, 2012 at 12:57 PM, Strahinja Kustudić
<strahinjak(at)nordeus(dot)com> wrote:
>
> @Jeff I'm not sure if I understand what you mean? I know that we never reuse
> key ranges. Could you be more clear, or give an example please.

If an index leaf page is completely empty because every entry on it
were deleted, it will get recycled to be used in some other part of
the index. (Eventually--it can take a while, especially if you have
long-running transactions).

But if the leaf page is only mostly empty, because only most of
entries on it were deleted, than it can never be reused, except for
entries that naturally fall into its existing key range (which will
never happen, if you never reuse key ranges)

So if you have a million records with keys 1..1000000, and do a
"delete from foo where key between 1 and 990000", then 99% of those
old index pages will become completely empty and eligible for reuse.
But if you do "delete from foo where key%100>0", then all of the pages
will become 99% empty, and none will be eligible for reuse (except the
very last one, which can still accept 1000001 and so on)

There has been talk of allowing logically adjacent, mostly empty
pages to be merged so that one of them becomes empty, but the way
concurrent access to btree indexes was designed this is extremely hard
to do safely.

Cheers,

Jeff

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Greg Williamson 2012-08-18 08:01:44 Re: Index Bloat Problem
Previous Message Jeff Janes 2012-08-17 18:42:27 Re: Increasing WAL usage followed by sudden drop