Re: Write Ahead Logging for Hash Indexes

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Jesper Pedersen <jesper(dot)pedersen(at)redhat(dot)com>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Write Ahead Logging for Hash Indexes
Date: 2017-03-12 02:36:02
Message-ID: CA+TgmoYgQbhohoSHvMZrgAQWvyhxR2qGpKnid6wWbW4v6+hU1Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Mar 11, 2017 at 12:20 AM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>> /*
>> + * Change the shared buffer state in critical section,
>> + * otherwise any error could make it unrecoverable after
>> + * recovery.
>> + */
>> + START_CRIT_SECTION();
>> +
>> + /*
>> * Insert tuple on new page, using _hash_pgaddtup to ensure
>> * correct ordering by hashkey. This is a tad inefficient
>> * since we may have to shuffle itempointers repeatedly.
>> * Possible future improvement: accumulate all the items for
>> * the new page and qsort them before insertion.
>> */
>> (void) _hash_pgaddtup(rel, nbuf, itemsz, new_itup);
>>
>> + END_CRIT_SECTION();
>>
>> No way. You have to start the critical section before making any page
>> modifications and keep it alive until all changes have been logged.
>>
>
> I think what we need to do here is to accumulate all the tuples that
> need to be added to new bucket page till either that page has no more
> space or there are no more tuples remaining in an old bucket. Then in
> a critical section, add them to the page using _hash_pgaddmultitup and
> log the entire new bucket page contents as is currently done in patch
> log_split_page().

I agree.

> Now, here we can choose to log the individual
> tuples as well instead of a complete page, however not sure if there
> is any benefit for doing the same because XLogRecordAssemble() will
> anyway remove the empty space from the page. Let me know if you have
> something else in mind.

Well, if you have two pages that are 75% full, and you move a third of
the tuples from one of them into the other, it's going to be about
four times more efficient to log only the moved tuples than the whole
page. But logging the whole filled page wouldn't be wrong, just
somewhat inefficient.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Mengxing Liu 2017-03-12 02:39:57 Re: [GSOC 17] Eliminate O(N^2) scaling from rw-conflict tracking in serializable transactions
Previous Message Robert Haas 2017-03-12 02:24:48 Re: WIP: Faster Expression Processing v4