Microvacuum support for Hash Index

From: Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com>
To: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Microvacuum support for Hash Index
Date: 2016-10-24 08:51:04
Message-ID: CAE9k0PkRSyzx8dOnokEpUi2A-RFZK72WN0h9DEMv_ut9q6bPRw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi All,

I have added a microvacuum support for hash index access method and
attached is the v1 patch for the same. The patch basically takes care
of the following things:

1. Firstly, it changes the marking of dead tuples from
'tuple-at-a-time' to 'page-at-a-time' during hash index scan. For this
we accumulate the heap tids and offset of all the hash index tuples if
it is pointed by kill_prior_tuple during scan and then mark all
accumulated tids as LP_DEAD either while stepping from one page to
another (assuming the scan in both forward and backward direction) or
during end of the hash index scan or during rescan.

2. Secondly, when inserting tuple into hash index table, if not enough
space is found on a current page then it ensures that we first clean
the dead tuples if found in the current hash index page before moving
to the next page in a bucket chain or going for a bucket split. This
basically increases the page reusability and reduces the number of
page splits, thereby reducing the overall size of hash index table.

I have compared the hash index size with and without my patch
(microvacuum_hash_index_v1.patch attached with this mail) on a high
end machine at various scale factors and the results are shown below.
For testing this, i have created hash index (pgbench_accounts_aid) on
aid column of 'pgbench_accounts' table instead of primary key and the
results shown in below table are for the same. The patch
(pgbench.patch) having these changes is also attached with this mail.
Moreover, I am using my own script file (file_hash_kill_prior_tuple)
for updating the index column with pgbench read-write command. The
script file 'file_hash_kill_prior_tuple' is also attached with this
mail.

Here are some initial test results showing the benefit of this patch:

postgresql.conf and pgbench settings:
autovacuum=off
client counts = 64
run time duration = 15 mins

./pgbench -c $threads -j $threads -T 900 postgres -f
~/file_hash_kill_prior_tuple

Scale Factor hash index size @ start HEAD HEAD + Patch
10 32 MB 579 MB 158 MB
50 128 MB 630 MB 350 MB
100 256 MB 1255 MB 635 MB
300 1024 MB 2233 MB 1093 MB

As shown in above result, at 10 scale factor the hash index size has
reduced by almost 4 times whereas at 50 and 300 scale factors it has
reduced by half with my patch. This basically proves that we can
reduce the hash index size to a good extent with this patch.

System specifications:
---------------------------------
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 128
On-line CPU(s) list: 0-127
Thread(s) per core: 2
Core(s) per socket: 8
Socket(s): 8
NUMA node(s): 8
Vendor ID: GenuineIntel

Note: The patch (microvacuum_hash_index_v1.patch) is prepared on top
of concurrent_hash_index_v8.patch-[1] and wal_hash_index_v5.1.patch[2]
for hash index.

[1] - https://www.postgresql.org/message-id/CAA4eK1%2BX%3D8sUd1UCZDZnE3D9CGi9kw%2Bkjxp2Tnw7SX5w8pLBNw%40mail.gmail.com
[2] - https://www.postgresql.org/message-id/CAA4eK1KE%3D%2BkkowyYD0vmch%3Dph4ND3H1tViAB%2B0cWTHqjZDDfqg%40mail.gmail.com

Attachment Content-Type Size
microvacuum_hash_index_v1.patch text/x-patch 16.7 KB
file_hash_kill_prior_tuple application/octet-stream 615 bytes
pgbench.patch text/x-patch 788 bytes

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Etsuro Fujita 2016-10-24 08:59:03 Re: Push down more full joins in postgres_fdw
Previous Message Julien Rouhaud 2016-10-24 07:59:55 Re: issue with track_commit_timestamp and server restart