Re: Freeze avoidance of very large table.

From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Jim Nasby <Jim(dot)Nasby(at)bluetreble(dot)com>, Peter Geoghegan <pg(at)heroku(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Petr Jelinek <petr(at)2ndquadrant(dot)com>, Greg S <stark(at)mit(dot)edu>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Freeze avoidance of very large table.
Date: 2016-03-08 16:23:54
Message-ID: CAMkU=1z68U5Co4aisNOzBgtXEehweydtr2SKngzH0hwoYHPgnA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Mar 8, 2016 at 5:30 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Tue, Mar 8, 2016 at 7:26 AM, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>> Regarding pg_visibility module, I'd like to share some bugs and
>> propose to add a relation type condition to each functions.
>
> OK, thanks.
>
>> Including it, I've attached remaining 2 patches; one is removing page
>> conversion code from pg_upgarde, and another is supporting pg_upgrade
>> for frozen bit.
>
> Committed 001 with minor tweaks.
>
> I find rewrite_vm_table to be pretty opaque. There's not even a
> comment explaining what it is supposed to do. And I wonder why we
> really need to be this efficient about it anyway. Like, would it be
> too expensive to just do this:
>
> for (i = 0; i < BITS_PER_BYTE; ++i)
> if ((old & (1 << i)) != 0)
> new |= 1 << (2 * i);
>
> And how about adding some more comments explaining why we are doing
> this rewriting, like this:
>
> In versions of PostgreSQL prior to catversion 201602181, PostgreSQL's
> visibility map included one bit per heap page; it now includes two.
> When upgrading a cluster from before that time to a current PostgreSQL
> version, we could refuse to copy visibility maps from the old cluster
> to the new cluster; the next VACUUM would recreate them, but at the
> price of scanning the entire table. So, instead, we rewrite the old
> visibility maps in the new format. That way, the all-visible bit
> remains set for the pages for which it was set previously. The
> all-frozen bit is never set by this conversion; we leave that to
> VACUUM.
>
> Also, I'm slightly perplexed by the fact that I can't see how this
> code succeeds in turning each page into two pages, which is something
> that it seems like it would need to do. Wouldn't we need to write out
> the old page header twice, one for the first of the two new pages and
> again for the second? I probably need more caffeine here, so please
> tell me what I'm missing.

I think that this loop:

while (blkend >= end)

Executes exactly twice for each iteration of the outer loop. I'd
rather see it written as a loop which explicitly executes twice,
rather looking like it might execute a dynamic number of times. I
can't imagine that this code needs to be future-proof. If we change
the format again in the future, surely we can't just change this code,
we would have to write new code for the new format.

Cheers,

Jeff

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Stephen Frost 2016-03-08 16:31:21 Re: Odd warning from pg_dump
Previous Message Andrew Dunstan 2016-03-08 16:22:07 Re: VS 2015 support in src/tools/msvc