Re: Doc chapter for Hash Indexes

From: Simon Riggs <simon(dot)riggs(at)enterprisedb(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, pryzby(at)telsasoft(dot)com
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Doc chapter for Hash Indexes
Date: 2021-06-22 09:00:51
Message-ID: CANbhV-Gg_wAawu0Bm3+yOnMnySCOQZRQ==U0AWrgC1-bjH+gzg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Jun 22, 2021 at 7:15 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Mon, Jun 21, 2021 at 6:38 PM Simon Riggs
> <simon(dot)riggs(at)enterprisedb(dot)com> wrote:
> >
> > New chapter for Hash Indexes, designed to help users understand how
> > they work and when to use them.
> >
> > Mostly newly written, but a few paras lifted from README when they were helpful.
> >
>
> Few comments
> ==============
> 1.
> + Hash indexes are best optimized for SELECTs and UPDATEs using equality
> + scans on larger tables.
>
> Is there a reason to mention Selects and Updates but not Deletes?

Deletes decrease the number of rows, so must eventually be matched with inserts.
So deletes imply inserts.

Perhaps it should say "update-heavy"

> 2.
> + Like B-Trees, hash indexes perform simple index tuple deletion. This
> + is a deferred maintenance operation that deletes index tuples that are
> + known to be safe to delete (those whose item identifier's LP_DEAD bit
> + is already set). This is performed speculatively upon each insert,
> + though may not succeed if the page is pinned by another backend.
>
> It is not very clear to me when we perform the simple index tuple
> deletion from the above sentence. We perform it when there is no space
> to accommodate a new tuple on the bucket page and as a result, we
> might need to create an overflow page. Basically, I am not sure
> saying: "This is performed speculatively upon each insert .." is
> helpful.

OK, thanks, will reword.

> 3.
> + incrementally expanded. When a new bucket is to be added to the index,
> + exactly one existing bucket will need to be "split", with some of its
> + tuples being transferred to the new bucket according to the updated
> + key-to-bucket-number mapping. This is essentially the same hash table
>
> In most places, the patch has used a single space after the full stop
> but at some places like above, it has used two spaces after full stop.
> I think it is better to be consistent.

OK

> 4.
> This is essentially the same hash table
> + management technique embodied in src/backend/utils/hash/dynahash.c for
> + in-memory hash tables used within PostgreSQL internals.
>
> I am not sure if there is a need to mention this in the user-facing
> doc. I think README is a better place for this.

OK, will remove. Thanks

I've reworded most things from both Amit and Justin; thanks for your reviews.

I attach both clean and compare versions.

--
Simon Riggs http://www.EnterpriseDB.com/

Attachment Content-Type Size
doc_hash_index.v1-v2.diff application/octet-stream 5.5 KB
doc_hash_index.v2.patch application/octet-stream 7.8 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2021-06-22 09:53:26 Re: disfavoring unparameterized nested loops
Previous Message Thomas Munro 2021-06-22 08:53:59 Cosmic ray hits integerset