Re: [POC] A better way to expand hash indexes.

From: Mithun Cy <mithun(dot)cy(at)enterprisedb(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, David Steele <david(at)pgmasters(dot)net>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [POC] A better way to expand hash indexes.
Date: 2017-03-31 05:15:52
Message-ID: CAD__OujD-iBxm91ZcqziaYftWqJxnFqgMv361V9zke83s6ifBg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Thanks, I have tried to fix all of the comments.

On Fri, Mar 31, 2017 at 8:10 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Thu, Mar 30, 2017 at 2:36 PM, Mithun Cy <mithun(dot)cy(at)enterprisedb(dot)com> wrote:
>> Thanks Robert, I have tried to fix the comments given as below.
>
> Thanks.
>
> Since this changes the on-disk format of hash indexes in an
> incompatible way, I think it should bump HASH_VERSION. (Hopefully
> that doesn't preclude REINDEX?) pg_upgrade should probably also be
> patched to issue a warning if upgrading from a version < 10 to a
> version >= 10 whenever hash indexes are present; I thought we had
> similar cases already, but I don't see them at the moment. Maybe we
> can get Bruce or someone to give us some advice on exactly what should
> be done here.

As of now increasing version ask us to REINDEX (metapage access verify
whether we are in right version)
postgres=# set enable_seqscan= off;
SET
postgres=# select * from t1 where i = 10;
ERROR: index "hash2" has wrong hash version
HINT: Please REINDEX it.
postgres=# insert into t1 values(10);
ERROR: index "hash2" has wrong hash version
HINT: Please REINDEX it.

postgres=# REINDEX INDEX hash2;
REINDEX
postgres=# select * from t1 where i = 10;
i
----
10
(1 row)

Last time we changed this version from 1 to 2
(4adc2f72a4ccd6e55e594aca837f09130a6af62b), from logs I see no changes
for upgrade specifically.

Hi Bruce, can you please advise us what should be done here.

> In a couple of places, you say that a splitpoint group has 4 slots
> rather than 4 phases.

--Fixed

> I think that in _hash_get_totalbuckets(), you should use blah &
> HASH_SPLITPOINT_PHASE_MASK rather than blah %
> HASH_SPLITPOINT_PHASES_PER_GRP for consistency with _hash_spareindex
> and, perhaps, speed. Similarly, instead of blah /
> HASH_SPLITPOINT_PHASES_PER_GRP, use blah >>
> HASH_SPLITPOINT_PHASE_BITS.

--Fixed

>
> buckets_toadd is punctuated oddly. buckets_to_add? Instead of
> hand-calculating this, how about calculating it as
> _hash_get_totalbuckets(spare_ndx) - _hash_get_totalbuckets(spare_ndx -
> 1)?

I think this should do that, considering new_bucket is nothng but
1-based max_buckets.
buckets_to_add = _hash_get_totalbuckets(spare_ndx) - new_bucket;

That makes me do away with

+#define HASH_SPLITPOINT_PHASE_TO_SPLITPOINT_GRP(sp_phase) \
+ (((sp_phase) < HASH_SPLITPOINT_GROUPS_WITH_ONE_PHASE) ? \
+ (sp_phase) : \
+ ((((sp_phase) - HASH_SPLITPOINT_GROUPS_WITH_ONE_PHASE) >> \
+ HASH_SPLITPOINT_PHASE_BITS) + \
+ HASH_SPLITPOINT_GROUPS_WITH_ONE_PHASE))

as this is now used in only one place _hash_get_totalbuckets.
I also think the comments above can be removed now. As we have removed
the code related to multi-phased allocation there.

+ * The number of buckets in the new splitpoint group is equal
to the
+ * total number already in existence. But we do not allocate
them at
+ * once. Each splitpoint group will have 4 phases, we
distribute the
+ * buckets equally among them. So we allocate only one fourth
of total
+ * buckets in new splitpoint group at a time to consume one phase after
+ * another.

> + uint32 splitpoint_group = 0;
>
> Don't need the = 0 here; the next reference to this variable is an
> unconditional initialization.

--Fixed, with new code splitpoint_group is not needed.

>
> + */
> +
> + splitpoint_group = HASH_SPLITPOINT_PHASE_TO_SPLITPOINT_GRP(spare_ndx);
>
> I would delete the blank line.

--Fixed.

>
> - * should start from ((2 ^ i) + metap->hashm_spares[i - 1] + 1).
> + * should start from
> + * (_hash_get_totalbuckets(i) + metap->hashm_spares[i - 1] + 1).
>
> Won't survive pgindent.

--Fixed as pgindent has suggested.

>
> - * The number of buckets in the new splitpoint is equal to the total
> - * number already in existence, i.e. new_bucket. Currently this maps
> - * one-to-one to blocks required, but someday we may need a more
> - * complicated calculation here. We treat allocation of buckets as a
> - * separate WAL-logged action. Even if we fail after this operation,
> - * won't leak bucket pages; rather, the next split will consume this
> - * space. In any case, even without failure we don't use all the space
> - * in one split operation.
> + * The number of buckets in the new splitpoint group is equal to the
> + * total number already in existence. But we do not allocate them at
> + * once. Each splitpoint group will have 4 slots, we distribute the
> + * buckets equally among them. So we allocate only one fourth of total
> + * buckets in new splitpoint group at a time to consume one phase after
> + * another. We treat allocation of buckets as a separate WAL-logged
> + * action. Even if we fail after this operation, won't leak bucket
> + * pages; rather, the next split will consume this space. In any case,
> + * even without failure we don't use all the space in one split
> + * operation.
>
> I think here you should break this into two paragraphs -- start a new
> paragraph with the sentence that begins "We treat..."

-- Fixed, I have removed the first paragraph it appeared as an extra
information when we do
buckets_to_add = _hash_get_totalbuckets(spare_ndx) - new_bucket;

--
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com

Attachment Content-Type Size
yet_another_expand_hashbucket_efficiently_14.patch application/octet-stream 19.2 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jan Michálek 2017-03-31 05:17:08 Re: Other formats in pset like markdown, rst, mediawiki
Previous Message Kyotaro HORIGUCHI 2017-03-31 05:11:44 Re: [patch] reorder tablespaces in basebackup tar stream for backup_label