Re: Write Ahead Logging for Hash Indexes

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Jesper Pedersen <jesper(dot)pedersen(at)redhat(dot)com>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Write Ahead Logging for Hash Indexes
Date: 2017-03-11 05:20:45
Message-ID: CAA4eK1Jx4_=pbkLXZgCbeUuOhCrnugk+oRKVhM-X4y-w9=bFHQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Mar 9, 2017 at 3:11 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Tue, Mar 7, 2017 at 6:41 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>>>> Great, thanks. 0001 looks good to me now, so committed.
>>>
>>> Committed 0002.
>>
>> Here are some initial review thoughts on 0003 based on a first read-through.
>
> More thoughts on the main patch:
>
> /*
> + * Change the shared buffer state in critical section,
> + * otherwise any error could make it unrecoverable after
> + * recovery.
> + */
> + START_CRIT_SECTION();
> +
> + /*
> * Insert tuple on new page, using _hash_pgaddtup to ensure
> * correct ordering by hashkey. This is a tad inefficient
> * since we may have to shuffle itempointers repeatedly.
> * Possible future improvement: accumulate all the items for
> * the new page and qsort them before insertion.
> */
> (void) _hash_pgaddtup(rel, nbuf, itemsz, new_itup);
>
> + END_CRIT_SECTION();
>
> No way. You have to start the critical section before making any page
> modifications and keep it alive until all changes have been logged.
>

I think what we need to do here is to accumulate all the tuples that
need to be added to new bucket page till either that page has no more
space or there are no more tuples remaining in an old bucket. Then in
a critical section, add them to the page using _hash_pgaddmultitup and
log the entire new bucket page contents as is currently done in patch
log_split_page(). Now, here we can choose to log the individual
tuples as well instead of a complete page, however not sure if there
is any benefit for doing the same because XLogRecordAssemble() will
anyway remove the empty space from the page. Let me know if you have
something else in mind.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavel Stehule 2017-03-11 06:01:58 Re: New CORRESPONDING clause design
Previous Message Eric Ridge 2017-03-11 04:31:16 Re: How to get the 'ctid' from a record type?