Re: [POC] A better way to expand hash indexes.

From: Mithun Cy <mithun(dot)cy(at)enterprisedb(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, David Steele <david(at)pgmasters(dot)net>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [POC] A better way to expand hash indexes.
Date: 2017-03-30 18:36:43
Message-ID: CAD__OujNpOBrPkoD=sHnPU83RTpoExs1P5p=Vmz4joV16ho=Ug@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Thanks Robert, I have tried to fix the comments given as below.

On Thu, Mar 30, 2017 at 9:19 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> I think that the macros in hash.h need some more work:
>
> - Pretty much any time you use the argument of a macro, you need to
> parenthesize it in the macro definition to avoid surprises if the
> macros is called using an expression. That isn't done consistently
> here.

--I have tried to fix same in the latest patch.

> - The macros make extensive use of magic numbers like 1, 2, and 3. I
> suggest something like:
>
> #define SPLITPOINT_PHASE_BITS 2
> #define SPLITPOINT_PHASES_PER_GROUP (1 << SPLITPOINT_PHASE_BITS)
>
> And then use SPLITPOINT_PHASE_BITS any place where you're currently
> saying 2. The reference to 3 is really SPLITPOINT_PHASE_BITS + 1.

-- Taken modified same in the latest patch.

> - Many of these macros are only used in one place. Maybe just move
> the computation to that place and get rid of the macro. For example,
> _hash_spareindex() could be written like this:
>
> if (splitpoint_group < SPLITPOINT_GROUPS_WITH_ONLY_ONE_PHASE)
> return splitpoint_group;
>
> /* account for single-phase groups */
> splitpoint = SPLITPOINT_GROUPS_WITH_ONLY_ONE_PHASE;
>
> /* account for completed groups */
> splitpoint += (splitpoint_group -
> SPLITPOINT_GROUPS_WITH_ONLY_ONE_PHASE) << SPLITPOINT_PHASE_BITS;
>
> /* account for phases within current group */
> splitpoint += (bucket_num >> (SPLITPOINT_PHASE_BITS + 1)) &
> SPLITPOINT_PHASE_MASK;
>
> return splitpoint;
>
> That eliminates the only use of two complicated macros and is in my
> opinion more clear than what you've currently got.

-- Taken, also rewrote _hash_get_totalbuckets in similar lines.

With that, we will end up with only 2 macros which have some computing code
+/* defines max number of splitpoit phases a hash index can have */
+#define HASH_MAX_SPLITPOINT_GROUP 32
+#define HASH_MAX_SPLITPOINTS \
+ (((HASH_MAX_SPLITPOINT_GROUP - HASH_SPLITPOINT_GROUPS_WITH_ONE_PHASE) * \
+ HASH_SPLITPOINT_PHASES_PER_GRP) + \
+ HASH_SPLITPOINT_GROUPS_WITH_ONE_PHASE)
+
+/* given a splitpoint phase get its group */
+#define HASH_SPLITPOINT_PHASE_TO_SPLITPOINT_GRP(sp_phase) \
+ (((sp_phase) < HASH_SPLITPOINT_GROUPS_WITH_ONE_PHASE) ? \
+ (sp_phase) : \
+ ((((sp_phase) - HASH_SPLITPOINT_GROUPS_WITH_ONE_PHASE) >> \
+ HASH_SPLITPOINT_PHASE_BITS) + \
+ HASH_SPLITPOINT_GROUPS_WITH_ONE_PHASE))

> - Some of these macros lack clear comments explaining their purpose.

-- I have written some comments to explain the use of the macros.

> - Some of them don't include HASH anywhere in the name, which is
> essential for a header that may easily be included by non-hash index
> code.

-- Fixed, all MACROS are prefixed with HASH

> - The names don't all follow a consistent format. Maybe that's too
> much to hope for at some level, but I think they could be more
> consistent than they are.

-- Fixed, apart from old HASH_MAX_SPLITPOINTS rest all have a prefix
HASH_SPLITPOINT.

--
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com

Attachment Content-Type Size
yet_another_expand_hashbucket_efficiently_12.patch application/octet-stream 19.8 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2017-03-30 18:38:55 strange parallel query behavior after OOM crashes
Previous Message Erik Rijkers 2017-03-30 18:31:34 Re: pgsql: Default monitoring roles