[POC] A better way to expand hash indexes.

From: Mithun Cy <mithun(dot)cy(at)enterprisedb(dot)com>
To: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Subject: [POC] A better way to expand hash indexes.
Date: 2017-02-17 13:51:32
Message-ID: CAD__OuhG6F1gQLCgMQNnMNgoCvOLQZz9zKYJQNYvYmmJoM42gA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi all,

As of now, we expand the hash index by doubling the number of bucket
blocks. But unfortunately, those blocks will not be used immediately.
So I thought if we can differ bucket block allocation by some factor,
hash index size can grow much efficiently. I have written a POC patch
which does following things -
Say at overflow point 'i' we need to add new "x = 2^(i-1)" buckets as
per the old code, I think we can do this addition of buckets in a
controlled way. Instead of adding all of 'x' bucket blocks at a time,
the patch will add x/4 blocks at a time. And, once those blocks are
consumed, it adds next installment of x/4 blocks. And, this will
continue until all of 'x' blocks of the overflow point 'i' is
allocated. My test result shows index size grows in a more efficient
way with above patch.

Note: This patch is just a POC. It can have bugs and I have to change
comments in the code, and README which are related to new changes.

Test:
create table t1(t int);
create index i1 on t1 using hash(t);
And then, Incrementally add rows as below.
insert into t1 select generate_series(1, 10000000);

records base index new patch
in million -- size(MB) -- index size(MB)

10 384 384

20 768 768

30 1417 1161

40 1531 1531

50 2556 1788

60 2785 2273

70 2963 2709

80 3060 3061

90 5111 3575

100 5111 3575

To implement such an incremental addition of bucket blocks I have to
increase the size of array hashm_spares in meta page by four times.
Also, mapping methods which map a total number of buckets to a
split-point position of hashm_spares array, need to be changed. These
changes create backward compatibility issues.

Implementation Details in brief:
=======================
Each element of hashm_spares (we call overflow point) is expanded into
4 slots {0, 1, 2, 3}. If 'x' (a power2 number) is the total number of
buckets to be added before the overflow point we add only a quarter of
it (x/4) to each slot, once we have consumed previously added blocks
we add next quarter of buckets and so on.
As in old code new hashm_spares[i] stores total overflow pages
allocated between those bucket allocation.

--
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com

Attachment Content-Type Size
expand_hashbucket_efficiently_01 application/octet-stream 6.7 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2017-02-17 13:59:15 Re: SCRAM authentication, take three
Previous Message Andreas Karlsson 2017-02-17 13:43:23 Re: REINDEX CONCURRENTLY 2.0