Freeze avoidance of very large table.

From: Sawada Masahiko <sawada(dot)mshk(at)gmail(dot)com>
To: Jim Nasby <Jim(dot)Nasby(at)bluetreble(dot)com>
Cc: Greg Stark <stark(at)mit(dot)edu>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Subject: Freeze avoidance of very large table.
Date: 2015-04-21 14:59:45
Message-ID: CAD21AoDLNoN+QCF8G8rEN8uR=gtyxofwkccPr0z+hsv7CqtT8w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Apr 21, 2015 at 7:00 AM, Jim Nasby <Jim(dot)Nasby(at)bluetreble(dot)com> wrote:
> On 4/20/15 2:45 AM, Sawada Masahiko wrote:
>>
>> Current patch adds new source file src/backend/access/heap/frozenmap.c
>> which is quite similar to visibilitymap.c. They have similar code but
>> are separated for now. I do refactoring these source code like adding
>> bitmap.c, if needed.
>

Thank you for having a look this patch.

>
> My feeling is we'd definitely want this refactored; it looks to be a whole
> lot of duplicated code. But before working on that we should get consensus
> that a FrozenMap is a good idea.

Yes, we need to get consensus about FrozenMap before starting work on.
In addition to comment you pointed out, I noticed that one problems I
should address, that a bit of FrozenMap need to be cleared on deletion and
(i.g. xmax is set).
The page as frozen could have the dead tuple for now, but I think to change
to that the frozen page guarantees that page is all frozen *and* all
visible.

> Are there any meaningful differences between the two, besides the obvious
> name changes?

No, there aren't.

> I think there's also a bunch of XLOG stuff that could be refactored too...

I agree with you.

>> Also, when skipping vacuum by visibility map, we can skip at least
>> SKIP_PAGE_THESHOLD consecutive page, but such mechanism is not in
>> frozen map.
>
>
> That's probably something else that can be factored out, since it's
> basically the same logic. I suspect we just need to && some of the checks
so
> we're looking at both FM and VM at the same time.

FrozenMap is used to skip scan only when anti-wrapping vacuum or freezing
all tuples (i.g scan_all is true).
The normal vacuum uses only VM, doesn't use FM for now.

> Other comments...
>
> It would be nice if we didn't need another page bit for FM; do you see any
> reasonable way that could happen?

We may be able to remove page bit for FM from page header, but I'm not sure
we could do that.

> + * If we didn't pin the visibility(and frozen) map page and the
page
> has
> + * become all visible(and frozen) while we were busy locking the
> buffer,
> + * or during some subsequent window during which we had it
unlocked,
> + * we'll have to unlock and re-lock, to avoid holding the buffer
> lock
> + * across an I/O. That's a bit unfortunate, especially since
we'll
> now
> + * have to recheck whether the tuple has been locked or updated
> under us,
> + * but hopefully it won't happen very often.
> */
>
> s/(and frozen)/ or frozen/
>
>
> + * Reply XLOG_HEAP3_FROZENMAP record.
> s/Reply/Replay/

Understood.

>
> + /*
> + * XLogReplayBufferExtended locked the buffer. But
> frozenmap_set
> + * will handle locking itself.
> + */
> + LockBuffer(fmbuffer, BUFFER_LOCK_UNLOCK);
>
> Doesn't this create a race condition?
>
>
> Are you sure the bit in finish_heap_swap() is safe? If so, we should add
the
> same the same for the visibility map too (it certainly better be all
visible
> if it's frozen...)

We can not ensure page is all visible even if we execute VACUUM FULL,
because of dead tuple could be remained. e.g. the case when other process
does insert and update to same tuple in same transaction before VACUUM FULL.
I was thinking that the FrozenMap is free of the influence of delete
operation. But as I said at top of this mail, a bit of FrozenMap needs to
be cleared on deletion.
So I will remove these related code as you mentioned.

>
>
>
> + /*
> + * Current block is all-visible.
> + * If frozen map represents that it's all frozen
and
> this
> + * function is called for freezing tuples, we can
> skip to
> + * vacuum block.
> + */
>
> I would state this as "Even if scan_all is true, we can skip blocks that
are
> marked as frozen."
>
> + if (frozenmap_test(onerel, blkno, &fmbuffer) &&
> scan_all)
>
> I suspect it's faster to reverse those tests (scan_all &&
> frozenmap_test())... but why do we even need to look at scan_all? AFAICT
if
> a block as frozen we can skip it unconditionally.

The tuple which is frozen and dead, could be remained in page is marked all
frozen, in currently patch.
i.g., There is possible to exist the page is not all visible but marked
frozen.
But I'm thinking to change that.

>
>
> + /*
> + * If the un-frozen tuple is remaining in current
> page and
> + * current page is marked as ALL_FROZEN, we should
> clear it.
> + */
>
> That needs to NEVER happen. If it does then we're going to consider tuples
> as visible/frozen that shouldn't be. We should probably throw an error
here,
> because it means the heap is now corrupted. At the minimum it needs to be
an
> assert().

I understood. I'll fix it.

> Note that I haven't reviewed all the logic in detail at this point. If
this
> ends up being refactored it'll be a lot easier to spot logic problems, so
> I'll hold off on that for now.

Understood, we need to get consen at first.

Regards,

-------
Sawada Masahiko

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2015-04-21 15:02:11 Re: Freeze avoidance of very large table.
Previous Message Andres Freund 2015-04-21 14:57:45 Re: INSERT ... ON CONFLICT IGNORE (and UPDATE) 3.0