Re: Write Ahead Logging for Hash Indexes

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Jesper Pedersen <jesper(dot)pedersen(at)redhat(dot)com>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Write Ahead Logging for Hash Indexes
Date: 2017-03-08 18:55:44
Message-ID: CA+Tgmoae4FjZ5wmUd6kCYsUNRYY-UwGUzYSrNNFvQfx1Vrik-w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Mar 8, 2017 at 7:45 AM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>> I still think this is a bad idea. Releasing and reacquiring the lock
>> on the master doesn't prevent the standby from seeing intermediate
>> states; the comment, if I understand correctly, is just plain wrong.
> I can remove this release and reacquire stuff, but first, let me try
> to explain what intermediate state I am talking here. If we don't
> have a release/reacquire lock here, standby could see an empty
> overflow page at that location whereas master will be able to see the
> new page only after the insertion of tuple/'s in that bucket is
> finished.

That's true, but you keep speaking as if the release and reacquire
prevents the standby from seeing that state. It doesn't. Quite the
opposite: it allows the master to see that intermediate state as well.
I don't think there is a problem with the standby seeing an
intermediate state that can't be seen on the master, provided we've
set up the code to handle that intermediate state properly, which
hopefully we have.

>> + * Initialise the freed overflow page, here we can't complete zeroed the
>>
>> Don't use British spelling, and don't use a comma to join two
>> sentences. Also "here we can't complete zeroed" doesn't seem right; I
>> don't know what that means.
>
> What the comment means to say is that we can't initialize page as zero
> (something like below)
> MemSet(ovflpage, 0, BufferGetPageSize(ovflbuf));
>
> typo in sentence
> /complete/completely

I think you can just say we initialize the page. Saying we can't
initializing it by zeroing it seems pretty obvious, both from the code
and from the general theory of how stuff works. If you did want to
explain it, I think you'd write something like "Just zeroing the page
won't work, because <reasons>."

>> + /*
>> + * We need to release and if required reacquire the lock on
>> + * rbuf to ensure that standby shouldn't see an intermediate
>> + * state of it. If we don't release the lock, after replay of
>> + * XLOG_HASH_MOVE_PAGE_CONTENTS on standby users will
>> be able to
>> + * view the results of partial deletion on rblkno.
>> + */
>> + LockBuffer(rbuf, BUFFER_LOCK_UNLOCK);
>>
>> If you DO release the lock, this will STILL be true, because what
>> matters on the standby is what the redo code does.
>
> That's right, what I intend to do here is to release the lock in
> master where it will be released in standby. In this case, we can't
> ensure what user can see in master is same as in standby after this
> WAL record is replayed, because in master we have exclusive lock on
> write buf, so no one can read the contents of read buf (the location
> of read buf will be after write buf) whereas, in standby, it will be
> possible to read the contents of the read buf. I think this is not a
> correctness issue, so we might want to leave as it is, what do you
> say?

Well, as in the case above, you're doing extra work to make sure that
every state which can be observed on the standby can also be observed
on the master. But that has no benefit, and it does have a cost: you
have to release and reacquire a lock that you could otherwise retain,
saving CPU cycles and code complexity. So I think the way you've got
it is not good.

>> + * We can log the exact changes made to meta page, however as no
>> + * concurrent operation could see the index during the replay of this
>> + * record, we can perform the operations during replay as they are
>> + * done here.
>>
>> Don't use a comma to join two sentences. Also, I don't understand
>> anything up to the "however".
>>
>
> I mean the below changes made to meta buf (refer existing code):
> metap->hashm_mapp[metap->hashm_nmaps] = num_buckets + 1;
>
> metap->hashm_nmaps++;
>
> I think it will be clear if you look both the DO and REDO operation of
> XLOG_HASH_INIT_BITMAP_PAGE. Let me know if you still think that the
> comment needs to be changed?

I think it's not very clear. I think what this boils down to is that
this is another place where you've got a comment to explain that you
didn't log the entire metapage, but I don't think anyone would expect
that anyway, so it doesn't seem like a very important thing to
explain. If you want a comment here, maybe something like /* This is
safe only because nobody else can be modifying the index at this
stage; it's only visible to the transaction that is creating it */

>> + * Set pd_lower just past the end of the metadata. This is to log
>> + * full_page_image of metapage in xloginsert.c.
>>
>> Why the underscores?
>>
>> + * won't be a leak on standby, as next split will consume this space.
>
> No specific reason, just trying to resemble full_page_writes, but can
> change if you feel it doesn't make much sense.

Well, usually I don't separate_words_in_a_sentence_with_underscores.
It makes sense to do that if you are referring to an identifier in the
code with that name, but otehrwise not.

>> + * won't be a leak on standby, as next split will consume this space.
>
>> The master and the standby had better be the same,
>
> That's right.
>
>> so I don't know
>> what it means to talk about a leak on the standby.
>>
>
> I just want to say that even if we fail after allocation of buckets
> and before the split operation is completed, the newly allocated
> buckets will neither be leaked on master nor on standby. Do you think
> we should add a comment for same or is it too obvious? How about
> changing it to something like:
>
> "Even if we fail after this operation, it won't leak buckets, as next
> split will consume this space. In any case, even without failure, we
> don't use all the space in one split operation."

Sure.

>> + * Change the shared buffer state in critical section,
>> + * otherwise any error could make it unrecoverable after
>> + * recovery.
>>
>> Uh, what? We only recover things during recovery. Nothing gets
>> recovered after recovery.
>
> Right, how about "Change the shared buffer state in a critical
> section, otherwise any error could make the page unrecoverable."

Sure.

>> Maybe this could get some tests, via the isolation tester, for things
>> like snapshot too old?
>
> Okay, I can try, but note that currently there is no test related to
> "snapshot too old" for any other indexes.

Wow, that's surprising. It seems the snapshot_too_old test only
checks that this works for a table that has no indexes. Have you,
anyway, tested it manually?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavel Stehule 2017-03-08 19:22:55 Re: background sessions
Previous Message Oleg Bartunov 2017-03-08 18:36:13 Re: SQL/JSON in PostgreSQL