Re: GiST VACUUM

From: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
To: Andrey Borodin <x4mmm(at)yandex-team(dot)ru>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Костя Кузнецов <chapaev28(at)ya(dot)ru>
Subject: Re: GiST VACUUM
Date: 2018-07-30 15:39:54
Message-ID: 844f3373-486e-a34d-d5a1-6a048e19355c@iki.fi
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 29/07/18 14:47, Andrey Borodin wrote:
> Fixed both problems. PFA v14.

Thanks, took a really quick look at this.

The text being added to README is outdated for these latest changes.

> In second step I still use paloc's memory, but only to store two
> bitmaps: bitmap of internal pages and bitmap of empty leafs. Second
> physical scan only reads internal pages. I can omit that bitmap, if
> I'll scan everything. Also, I can replace emptyLeafs bitmap with
> array\list, but I do not really think it will be big.

On a typical GiST index, what's the ratio of leaf vs. internal pages?
Perhaps an array would indeed be better. If you have a really large
index, the bitmaps can take a fair amount of memory, on top of the
memory used for tracking the dead TIDs. I.e. that memory will be in
addition to maintenance_work_mem. That's not nice, but I think it's OK
in practice, and not worth spending too much effort to eliminate. For a
1 TB index with default 8k block size, the two bitmaps will take 32 MB
of memory in total. If you're dealing with a database of that size, you
ought to have some memory to spare. But if an array would use less
memory, that'd be better.

If you go with bitmaps, please use the existing Bitmapset instead of
rolling your own. Saves some code, and it provides more optimized
routines for iterating through all the set bits, too
(bms_next_member()). Another possibility would be to use Tidbitmap, in
the "lossy" mode, i.e. add the pages with tbm_add_page(). That might
save some memory, compared to Bitmapset, if the bitmap is very sparse.
Not sure how it compares with a plain array.

A straightforward little optimization would be to skip scanning the
internal pages, when the first scan didn't find any empty pages. And
stop the scan of the internal pages as soon as all the empty pages have
been recycled.

- Heikki

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bossart, Nathan 2018-07-30 15:42:55 Re: BUG #15182: Canceling authentication due to timeout aka Denial of Service Attack
Previous Message Jonathan S. Katz 2018-07-30 15:14:31 Re: Explain buffers wrong counter with parallel plans