Re: [POC] A better way to expand hash indexes.

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Mithun Cy <mithun(dot)cy(at)enterprisedb(dot)com>, David Steele <david(at)pgmasters(dot)net>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [POC] A better way to expand hash indexes.
Date: 2017-03-29 04:42:18
Message-ID: CAA4eK1K4FDbgyWXGhvVeMd5yQpJiFDTpXij05=iawv_G8iU5QQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Mar 28, 2017 at 8:56 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Tue, Mar 28, 2017 at 5:00 AM, Mithun Cy <mithun(dot)cy(at)enterprisedb(dot)com> wrote:
>>> This will go wrong for split point group zero. In general, I feel if
>>> you handle computation for split groups lesser than
>>> SPLITPOINT_GROUPS_WITH_ONLY_ONE_PHASE in the caller, then all your
>>> macros will become much simpler and less error prone.
>>
>> Fixed, apart from SPLITPOINT_PHASE_TO_SPLITPOINT_GRP rest all macros
>> only handle multi phase group. The SPLITPOINT_PHASE_TO_SPLITPOINT_GRP
>> is used in one more place at expand index so thought kepeping it as it
>> is is better.
>
> I wonder if we should consider increasing
> SPLITPOINT_GROUPS_WITH_ONLY_ONE_PHASE somewhat. For example, split
> point 4 is responsible for allocating only 16 new buckets = 128kB;
> doing those in four groups of two (16kB) seems fairly pointless.
> Suppose we start applying this technique beginning around splitpoint 9
> or 10. Breaking 1024 new buckets * 8kB = 8MB of index growth into 4
> phases might save enough to be worthwhile.
>

10 sounds better point to start allocating in phases.

> Of course, even if we decide to apply this even for very small
> splitpoints, it probably doesn't cost us anything other than some
> space in the metapage. But maybe saving space in the metapage isn't
> such a bad idea anyway.
>

Yeah metapage space is scarce, so lets try to save it as much possible.

Few other comments:
+/*
+ * This is just a trick to save a division operation. If you look into the
+ * bitmap of 0-based bucket_num 2nd and 3rd most significant bit will indicate
+ * which phase of allocation the bucket_num belongs to with in the group. This
+ * is because at every splitpoint group we allocate (2 ^ x) buckets and we have
+ * divided the allocation process into 4 equal phases. This macro returns value
+ * from 0 to 3.
+ */

+#define SPLITPOINT_PHASES_WITHIN_GROUP(sp_g, bucket_num) \
+ (((bucket_num) >> (sp_g - SPLITPOINT_GROUPS_WITH_ONLY_ONE_PHASE)) & \
+ SPLITPOINT_PHASE_MASK)

This won't work if we change SPLITPOINT_GROUPS_WITH_ONLY_ONE_PHASE to
number other than 3. I think you should change it so that it can work
with any value of SPLITPOINT_GROUPS_WITH_ONLY_ONE_PHASE.

I think you should name this define as SPLITPOINT_PHASE_WITHIN_GROUP
as this refers to only one particular phase within group.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kyotaro HORIGUCHI 2017-03-29 05:21:12 Re: Multiple false-positive warnings from Valgrind
Previous Message Fabien COELHO 2017-03-29 04:18:13 Re: \if, \elseif, \else, \endif (was Re: PSQL commands: \quit_if, \quit_unless)