Re: Write Ahead Logging for Hash Indexes

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Jesper Pedersen <jesper(dot)pedersen(at)redhat(dot)com>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Write Ahead Logging for Hash Indexes
Date: 2017-03-13 12:59:46
Message-ID: CAA4eK1+TuLSo0k9DCFi8U4xTxCe6trUHauxTDp06Ng6U=Bs=Cg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Mar 12, 2017 at 8:06 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Sat, Mar 11, 2017 at 12:20 AM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>>> /*
>>> + * Change the shared buffer state in critical section,
>>> + * otherwise any error could make it unrecoverable after
>>> + * recovery.
>>> + */
>>> + START_CRIT_SECTION();
>>> +
>>> + /*
>>> * Insert tuple on new page, using _hash_pgaddtup to ensure
>>> * correct ordering by hashkey. This is a tad inefficient
>>> * since we may have to shuffle itempointers repeatedly.
>>> * Possible future improvement: accumulate all the items for
>>> * the new page and qsort them before insertion.
>>> */
>>> (void) _hash_pgaddtup(rel, nbuf, itemsz, new_itup);
>>>
>>> + END_CRIT_SECTION();
>>>
>>> No way. You have to start the critical section before making any page
>>> modifications and keep it alive until all changes have been logged.
>>>
>>
>> I think what we need to do here is to accumulate all the tuples that
>> need to be added to new bucket page till either that page has no more
>> space or there are no more tuples remaining in an old bucket. Then in
>> a critical section, add them to the page using _hash_pgaddmultitup and
>> log the entire new bucket page contents as is currently done in patch
>> log_split_page().
>
> I agree.
>

Okay, I have changed like that in the patch.

>> Now, here we can choose to log the individual
>> tuples as well instead of a complete page, however not sure if there
>> is any benefit for doing the same because XLogRecordAssemble() will
>> anyway remove the empty space from the page. Let me know if you have
>> something else in mind.
>
> Well, if you have two pages that are 75% full, and you move a third of
> the tuples from one of them into the other, it's going to be about
> four times more efficient to log only the moved tuples than the whole
> page.
>

I don't see how this could happen during split? I mean if you are
moving 25% tuples from old bucket page to a new bucket page which is
75% full, it will log the new bucket page only after it gets full.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Surafel Temesgen 2017-03-13 13:13:04 Re: New CORRESPONDING clause design
Previous Message Amit Kapila 2017-03-13 12:56:14 Re: Write Ahead Logging for Hash Indexes