Hello. I am student from gsoc programm.
My project is sequantial access in vacuum of gist.

New vacuum has 2 big step:
physical order scan pages and cleaning after 1 step.


1 scan - scan all pages and create information map(hashmap) and add information to rescan stack( stack of pages that needed to rescanning

second step is work only with page(from rescan stack) where there is a changes. In new version of vacuum besides increased speed also there is a deleting of pages. Only leaf pages can be deleted. The process of deleteing pages is (1. delete link to page. 2. change rightlinks (if needed) 3. set deleted). I added 2 action in wal (when i set delete flag and when i change rightlinks). When i delete links to leaf pages from inner page i always save 1 link to leaf(avoiding situations with empty inner pages).

I attach some speed benchmarks.

i compare old and new version on my laptop(without ssd). the test: table "point_tbl" from regression database. i insert about 200 millions rows. after that i delete 33 million and run vacuum.

size of index is about 18 gb.

old version:

INFO: vacuuming "public.point_tbl"
INFO: scanned index "gpointind" to remove 11184520 row versions
DETAIL: CPU 84.70s/72.26u sec elapsed 27007.14 sec.
INFO: "point_tbl": removed 11184520 row versions in 400715 pages
DETAIL: CPU 3.96s/3.10u sec elapsed 233.12 sec.
INFO: scanned index "gpointind" to remove 11184523 row versions
DETAIL: CPU 87.10s/69.05u sec elapsed 26410.44 sec.
INFO: "point_tbl": removed 11184523 row versions in 400715 pages
DETAIL: CPU 4.23s/3.36u sec elapsed 331.43 sec.
INFO: scanned index "gpointind" to remove 11184523 row versions
DETAIL: CPU 87.65s/65.73u sec elapsed 26230.35 sec.
INFO: "point_tbl": removed 11184523 row versions in 400715 pages
DETAIL: CPU 4.47s/3.41u sec elapsed 342.93 sec.
INFO: scanned index "gpointind" to remove 866 row versions
DETAIL: CPU 79.97s/39.64u sec elapsed 23341.88 sec.
INFO: "point_tbl": removed 866 row versions in 31 pages
DETAIL: CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO: index "gpointind" now contains 201326592 row versions in 2336441 pages
DETAIL: 33554432 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.

š

š

new vacuum is about

š

INFO: vacuuming "public.point_tbl"
INFO: scanned index "gpointind" to remove 11184520 row versions
DETAIL: CPU 13.00s/27.57u sec elapsed 1864.22 sec.
INFO: "point_tbl": removed 11184520 row versions in 400715 pages
DETAIL: CPU 3.46s/2.86u sec elapsed 214.04 sec.
INFO: scanned index "gpointind" to remove 11184523 row versions
DETAIL: CPU 14.17s/27.02u sec elapsed 2163.67 sec.
INFO: "point_tbl": removed 11184523 row versions in 400715 pages
DETAIL: CPU 3.33s/2.99u sec elapsed 222.60 sec.
INFO: scanned index "gpointind" to remove 11184523 row versions
DETAIL: CPU 11.84s/25.23u sec elapsed 1828.71 sec.
INFO: "point_tbl": removed 11184523 row versions in 400715 pages
DETAIL: CPU 3.44s/2.81u sec elapsed 215.06 sec.
INFO: scanned index "gpointind" to remove 866 row versions
DETAIL: CPU 5.62s/6.68u sec elapsed 176.67 sec.
INFO: "point_tbl": removed 866 row versions in 31 pages
DETAIL: CPU 0.00s/0.00u sec elapsed 0.01 sec.
INFO: index "gpointind" now contains 201326592 row versions in 2336360 pages
DETAIL: 33554432 index row versions were removed.
150833 index pages have been deleted, 150833 are currently reusable.
CPU 5.54s/2.08u sec elapsed 165.61 sec.
INFO: "point_tbl": found 33554432 removable, 201326592 nonremovable row versions in 1202176 out of 1202176 pages
DETAIL: 0 dead row versions cannot be removed yet.
There were 0 unused item pointers.
Skipped 0 pages due to buffer pins.
0 pages are entirely empty.
CPU 73.50s/116.82u sec elapsed 8300.73 sec.
INFO: analyzing "public.point_tbl"
INFO: "point_tbl": scanned 100 of 1202176 pages, containing 16756 live rows and 0 dead rows; 100 rows in sample, 201326601 estimated total rows
VACUUM

š

There is a big speed up + we can reuse some pages.

Thanks.