Re: Freeze avoidance of very large table.

From: Jim Nasby <Jim(dot)Nasby(at)BlueTreble(dot)com>
To: Sawada Masahiko <sawada(dot)mshk(at)gmail(dot)com>
Cc: Greg Stark <stark(at)mit(dot)edu>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Subject: Re: Freeze avoidance of very large table.
Date: 2015-04-20 22:00:17
Message-ID: 553576F1.1010809@BlueTreble.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 4/20/15 2:45 AM, Sawada Masahiko wrote:
> Current patch adds new source file src/backend/access/heap/frozenmap.c
> which is quite similar to visibilitymap.c. They have similar code but
> are separated for now. I do refactoring these source code like adding
> bitmap.c, if needed.

My feeling is we'd definitely want this refactored; it looks to be a
whole lot of duplicated code. But before working on that we should get
consensus that a FrozenMap is a good idea.

Are there any meaningful differences between the two, besides the
obvious name changes?

I think there's also a bunch of XLOG stuff that could be refactored too...

> Also, when skipping vacuum by visibility map, we can skip at least
> SKIP_PAGE_THESHOLD consecutive page, but such mechanism is not in
> frozen map.

That's probably something else that can be factored out, since it's
basically the same logic. I suspect we just need to && some of the
checks so we're looking at both FM and VM at the same time.

Other comments...

It would be nice if we didn't need another page bit for FM; do you see
any reasonable way that could happen?

+ * If we didn't pin the visibility(and frozen) map page and the page has
+ * become all visible(and frozen) while we were busy locking the buffer,
+ * or during some subsequent window during which we had it unlocked,
+ * we'll have to unlock and re-lock, to avoid holding the buffer lock
+ * across an I/O. That's a bit unfortunate, especially since we'll now
+ * have to recheck whether the tuple has been locked or updated under us,
+ * but hopefully it won't happen very often.
*/

s/(and frozen)/ or frozen/

+ * Reply XLOG_HEAP3_FROZENMAP record.
s/Reply/Replay/

+ /*
+ * XLogReplayBufferExtended locked the buffer. But frozenmap_set
+ * will handle locking itself.
+ */
+ LockBuffer(fmbuffer, BUFFER_LOCK_UNLOCK);

Doesn't this create a race condition?

Are you sure the bit in finish_heap_swap() is safe? If so, we should add
the same the same for the visibility map too (it certainly better be all
visible if it's frozen...)

+ /*
+ * Current block is all-visible.
+ * If frozen map represents that it's all frozen and this
+ * function is called for freezing tuples, we can skip to
+ * vacuum block.
+ */

I would state this as "Even if scan_all is true, we can skip blocks that
are marked as frozen."

+ if (frozenmap_test(onerel, blkno, &fmbuffer) && scan_all)

I suspect it's faster to reverse those tests (scan_all &&
frozenmap_test())... but why do we even need to look at scan_all? AFAICT
if a block as frozen we can skip it unconditionally.

+ /*
+ * If the un-frozen tuple is remaining in current page and
+ * current page is marked as ALL_FROZEN, we should clear it.
+ */

That needs to NEVER happen. If it does then we're going to consider
tuples as visible/frozen that shouldn't be. We should probably throw an
error here, because it means the heap is now corrupted. At the minimum
it needs to be an assert().

Note that I haven't reviewed all the logic in detail at this point. If
this ends up being refactored it'll be a lot easier to spot logic
problems, so I'll hold off on that for now.
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2015-04-20 22:13:38 Re: Turning off HOT/Cleanup sometimes
Previous Message Bruce Momjian 2015-04-20 21:49:18 Re: Turning off HOT/Cleanup sometimes