Boundary value check in lazy_tid_reaped()

From: Masahiko Sawada <masahiko(dot)sawada(at)2ndquadrant(dot)com>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Boundary value check in lazy_tid_reaped()
Date: 2020-08-30 11:07:38
Message-ID: CA+fd4k76j8jKzJzcx8UqEugvayaMSnQz0iLUt_XgBp-_-bd22A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi all,

In the current lazy vacuum implementation, some index AMs such as
btree indexes call lazy_tid_reaped() for each index tuples during
ambulkdelete to check if the index tuple points to the (collected)
garbage tuple. In that function, we simply call bsearch(3) but we
should be able to know the result without bsearch(3) if the index
tuple points to the heap tuple that is out of range of the collected
garbage tuples. I've checked whether bsearch(3) does the boundary
value check or not but looking at the bsearch(3) implementation of
FreeBSD libc[1], there is not. The same is true for glibc.

So my proposal is to add boundary value check in lazy_tid_reaped()
before executing bsearch(3). This will help when index vacuum happens
multiple times or when garbage tuples are concentrated to a narrow
range.

I’ve done three performance tests. maintenance_work_mem is set to 64MB
with which we can collect about 11 million tuples at maximum and the
table size is 10GB. The table has one btree index. Here is the average
of index vacuum (ambulkdelete) execution time of three trials:

* Test case 1

I made all pages dirty with 15 million garbage tuples to invoke index
vacuum twice.

HEAD: 164.7 s
Patched: 138.81 s

* Test case 2

I made all pages dirty with 10 million garbage tuples to invoke index
vacuum only once, checking the performance degradation.

HEAD: 127.8 s
Patched: 128.25 s

* Test case 3

I made a certain range of table dirty with 1 million garbage tuples.

HEAD: 31.07 s
Patched: 9.41 s

I thought that we can have a generic function wrapping bsearch(3) that
does boundary value checks and then does bsearch(3) so that we can use
it in other similar places as well. But the attached patch doesn't do
that as I'd like to hear opinions on the proposal first.

Feedback is very welcome.

Regards,

[1] https://svnweb.freebsd.org/base/head/lib/libc/stdlib/bsearch.c?revision=326025&view=markup

--
Masahiko Sawada http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment Content-Type Size
vacuum_boundary_check.patch application/octet-stream 3.7 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrey M. Borodin 2020-08-30 12:04:46 Re: Yet another fast GiST build (typo)
Previous Message Dilip Kumar 2020-08-30 09:13:20 Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions