Re: why do hash index builds use smgrextend() for new splitpoint pages

From: Melanie Plageman <melanieplageman(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>
Subject: Re: why do hash index builds use smgrextend() for new splitpoint pages
Date: 2022-02-25 21:31:23
Message-ID: CAAKRu_ZnNM-FAYNOsgFD6JT9_c0Dc5b61atykpy_9sAhevLh9g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Feb 24, 2022 at 10:24 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Fri, Feb 25, 2022 at 4:41 AM Melanie Plageman
> <melanieplageman(at)gmail(dot)com> wrote:
> >
> > I'm trying to understand why hash indexes are built primarily in shared
> > buffers except when allocating a new splitpoint's worth of bucket pages
> > -- which is done with smgrextend() directly in _hash_alloc_buckets().
> >
> > Is this just so that the value returned by smgrnblocks() includes the
> > new splitpoint's worth of bucket pages?
> >
> > All writes of tuple data to pages in this new splitpoint will go
> > through shared buffers (via hash_getnewbuf()).
> >
> > I asked this and got some thoughts from Robert in [1], but I still don't
> > really get it.
> >
> > When a new page is needed during the hash index build, why can't
> > _hash_expandtable() just call ReadBufferExtended() with P_NEW instead of
> > _hash_getnewbuf()? Does it have to do with the BUCKET_TO_BLKNO mapping?
> >
>
> We allocate the chunk of pages (power-of-2 groups) at the time of
> split which allows them to appear consecutively in an index. This
> helps us to compute the physical block number from bucket number
> easily (BUCKET_TO_BLKNO mapping) with some minimal control
> information.

got it, thanks.

Since _hash_alloc_buckets() WAL-logs the last page of the
splitpoint, is it safe to skip the smgrimmedsync()? What if the last
page of the splitpoint doesn't end up having any tuples added to it
during the index build and the redo pointer is moved past the WAL for
this page and then later there is a crash sometime before this page
makes it to permanent storage. Does it matter that this page is lost? If
not, then why bother WAL-logging it?

- Melanie

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Hsu, John 2022-02-25 21:52:03 Re: Allow async standbys wait for sync replication (was: Disallow quorum uncommitted (with synchronous standbys) txns in logical replication subscribers)
Previous Message Imseih (AWS), Sami 2022-02-25 21:28:18 Re: [BUG] Panic due to incorrect missingContrecPtr after promotion