Re: Freeze avoidance of very large table.

From: Sawada Masahiko <sawada(dot)mshk(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Jim Nasby <Jim(dot)Nasby(at)bluetreble(dot)com>, Petr Jelinek <petr(at)2ndquadrant(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Greg Stark <stark(at)mit(dot)edu>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Subject: Re: Freeze avoidance of very large table.
Date: 2015-07-07 14:18:05
Message-ID: CAD21AoC88gDOfSnJ0czkeM_9snqrvCxf0mai3Y3mMDoK43VFNQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> So I don't understand why you have two separate calls to visibilitymap_clear()
> Surely the logic should be to clear both bits at the same time?
Yes, you're right. all-frozen bit should be cleared at the same time
as clearing all-visible bit.

> 1. Both bits unset ~(VISIBILITYMAP_ALL_VISIBLE | VISIBILITYMAP_ALL_FROZEN)
> which can be changed to state 2 only
> 2. VISIBILITYMAP_ALL_VISIBLE only
> which can be changed state 1 or state 3
> 3. VISIBILITYMAP_ALL_VISIBLE | VISIBILITYMAP_ALL_FROZEN
> which can be changed to state 1 only
> If that is the case please simplify the logic for setting and unsetting the bits so they are set together efficiently.
> At the same time please also put in Asserts to ensure that the state logic is maintained when it is set and when it is tested.
>
> In patch, during Vacuum first the frozen bit is set and then the visibility
> will be set in a later operation, now if the crash happens between those
> 2 operations, then isn't it possible that the frozen bit is set and visible
> bit is not set?

In current patch, frozen bit is set first in lazy_scan_heap(), so it's
possible to have VM bits set frozen bit but not visible as Amit
pointed out.
To fix it, I'm modifying the patch to more simpler and setting both
bits at the same time efficiently.

> I would also like to see the visibilitymap_test function exposed in SQL,
> so we can write code to examine the map contents for particular ctids.
> By doing that we can then write a formal test that shows the evolution of tuples from insertion,
> vacuuming and freezing, testing the map has been set correctly at each stage.
> I guess that needs to be done as an isolationtest so we have an observer that contrains the xmin in various ways.
> In light of multixact bugs, any code that changes the on-disk tuple metadata needs formal tests.

Attached patch adds a few function to contrib/pg_freespacemap to
explore the inside of visibility map, which I used for my test.
I hope it helps for testing this feature.

> I think we need something for pg_upgrade to rewrite existing VMs.
> Otherwise a large read only database would suddenly require a massive
> revacuum after upgrade, which seems bad. That can wait for now until we all
> agree this patch is sound.

Yeah, I will address them.

Regards,

--
Sawada Masahiko

Attachment Content-Type Size
001_visibilitymap_test_function.patch application/octet-stream 4.2 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2015-07-07 14:26:56 Re: Comfortably check BackendPID with psql
Previous Message Tom Lane 2015-07-07 14:17:39 Re: Comfortably check BackendPID with psql