Re: GiST VACUUM

From: Andrey Borodin <x4mmm(at)yandex-team(dot)ru>
To: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Cc: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Костя Кузнецов <chapaev28(at)ya(dot)ru>
Subject: Re: GiST VACUUM
Date: 2018-07-31 20:06:45
Message-ID: 4A1FF988-A69D-4607-8A6E-B3EC0FD6C2CF@yandex-team.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi! Thanks for looking into the patch!

> 30 июля 2018 г., в 18:39, Heikki Linnakangas <hlinnaka(at)iki(dot)fi> написал(а):
>
> On 29/07/18 14:47, Andrey Borodin wrote:
>> Fixed both problems. PFA v14.
>
> Thanks, took a really quick look at this.
>
> The text being added to README is outdated for these latest changes.
Fixed.
>
>> In second step I still use paloc's memory, but only to store two
>> bitmaps: bitmap of internal pages and bitmap of empty leafs. Second
>> physical scan only reads internal pages. I can omit that bitmap, if
>> I'll scan everything. Also, I can replace emptyLeafs bitmap with
>> array\list, but I do not really think it will be big.
>
> On a typical GiST index, what's the ratio of leaf vs. internal pages? Perhaps an array would indeed be better.
Typical GiST has around 200 tuples per internal page. I've switched to List since it's more efficient than bitmap. Is
> If you have a really large index, the bitmaps can take a fair amount of memory, on top of the memory used for tracking the dead TIDs. I.e. that memory will be in addition to maintenance_work_mem. That's not nice, but I think it's OK in practice, and not worth spending too much effort to eliminate. For a 1 TB index with default 8k block size, the two bitmaps will take 32 MB of memory in total. If you're dealing with a database of that size, you ought to have some memory to spare. But if an array would use less memory, that'd be better.

>
> If you go with bitmaps, please use the existing Bitmapset instead of rolling your own. Saves some code, and it provides more optimized routines for iterating through all the set bits, too (bms_next_member()). Another possibility would be to use Tidbitmap, in the "lossy" mode, i.e. add the pages with tbm_add_page(). That might save some memory, compared to Bitmapset, if the bitmap is very sparse. Not sure how it compares with a plain array.
Yeah, I've stopped reinventing that bicycle. But I have to note that default growth strategy of Bitmap is not good: we will be repallocing byte by byte.

>
> A straightforward little optimization would be to skip scanning the internal pages, when the first scan didn't find any empty pages. And stop the scan of the internal pages as soon as all the empty pages have been recycled.
Done.

PFA v15.

Best regards, Andrey Borodin.

Attachment Content-Type Size
0002-Delete-pages-during-GiST-VACUUM-v15.patch application/octet-stream 20.7 KB
0001-Physical-GiST-scan-in-VACUUM-v15.patch application/octet-stream 16.6 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2018-07-31 20:17:00 Re: Should contrib modules install .h files?
Previous Message Joshua D. Drake 2018-07-31 19:52:40 Re: Online enabling of checksums