Re: Improving replay of XLOG_BTREE_VACUUM records

From: Jim Nasby <Jim(dot)Nasby(at)BlueTreble(dot)com>
To: Vladimir Borodin <root(at)simply(dot)name>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Cc: Kevin Grittner <kgrittn(at)ymail(dot)com>
Subject: Re: Improving replay of XLOG_BTREE_VACUUM records
Date: 2015-05-01 23:10:31
Message-ID: 554407E7.5090003@BlueTreble.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 5/1/15 11:19 AM, Vladimir Borodin wrote:
> There are situations in which vacuuming big btree index causes stuck in
> WAL replaying on hot standby servers for quite a long time. I’ve
> described the problem in more details in this thread [0]. Below in that
> thread Kevin Grittner proposed a good way for improving btree scans so
> that btree vacuuming logic could be seriously simplified. Since I don’t
> know when that may happen I’ve done a patch that makes some improvement
> right now. If Kevin or someone else would expand [1] for handling all
> types of btree scans, I suppose, my patch could be thrown away and
> vacuuming logic should be strongly rewritten.

This looks like a good way to address this until the more significant
work can be done.

I'm not a fan of "RBM_ZERO_NO_BM_VALID"; how about RBM_ZERO_BM_INVALID?
or BM_NOT_VALID? Or maybe I'm just trying to impose too much English on
the code; I see the logic to NO_BM_VALID...

+ * RBM_ZERO_NO_BM_VALID is the same as RBM_ZERO_AND_LOCK, but does not set
+ * BM_VALID bit before returning buffer so that noone could pin it.

It would be better to explain why we want that mode. How about:

RBM_ZERO_NO_BM_VALID is the same as RBM_ZERO_AND_LOCK but does not set
BM_VALID before returning the buffer. This is done to ensure that no one
can pin the buffer without actually reading the buffer contents in. This
is necessary while replying XLOG_BTREE_VACUUM records in hot standby.

+ if (mode == RBM_ZERO_NO_BM_VALID)
+ TerminateBufferIO(bufHdr, false, 0);
+ else
+ TerminateBufferIO(bufHdr, false, BM_VALID);

Simply passing in a 0 seems a bit odd to me; is there anywhere else we
do that?
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Josh Berkus 2015-05-01 23:24:33 Re: CTE optimization fence on the todo list?
Previous Message David Steele 2015-05-01 22:59:42 Re: CTE optimization fence on the todo list?