Quick Links

Re: Freeze avoidance of very large table.

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc:	Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Jim Nasby <Jim(dot)Nasby(at)bluetreble(dot)com>, Peter Geoghegan <pg(at)heroku(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Petr Jelinek <petr(at)2ndquadrant(dot)com>, Greg S <stark(at)mit(dot)edu>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Freeze avoidance of very large table.
Date:	2016-03-10 19:27:55
Message-ID:	CA+TgmoYJgy+-KXKTOUYDmApW4VZ1aAbRQrRhpzDx4=55Vr7EAg@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Thu, Mar 10, 2016 at 1:41 PM, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> On Fri, Mar 11, 2016 at 1:03 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>> This 001 patch looks so little like what I was expecting that I
>> decided to start over from scratch. The new version I wrote is
>> attached here. I don't understand why your version tinkers with the
>> logic for setting the all-frozen bit; I thought that what I already
>> committed dealt with that already, and in any case, your version
>> doesn't even compile against latest sources. Your version also leaves
>> the scan_all terminology intact even though it's not accurate any
>> more, and I am not very convinced that the updates to the
>> page-skipping logic are actually correct. Please have a look over
>> this version and see what you think.
>
> Thank you for your advise.
> Sorry, optimising logic of previous patch was old by mistake.
> Attached latest patch incorporated your suggestions with a little revising.

OK, I'll have a look. Thanks.

>> I think that's kind of pointless. We need to test that this
>> conversion code works, but once it does, I don't think we should make
>> everybody pay the overhead of retesting that. Anyway, the test code
>> could have bugs, too.
>>
>> Here's an updated version of your patch with that code removed and
>> some cosmetic cleanups like fixing typos and stuff like that. I think
>> this is mostly ready to commit, but I noticed one problem: your
>> conversion code always produces two output pages for each input page
>> even if one of them would be empty. In particular, if you have a
>> large number of small relations and run pg_upgrade, all of their
>> visibility maps will go from 8kB to 16kB. That isn't the end of the
>> world, maybe, but I think you should see if you can't fix it
>> somehow....
>
> Thank you for updating patch.
> To deal with this problem, I've changed it so that pg_upgrade checks
> file size before conversion.
> And if fork file does not exist or size is 0 (empty), ignore.
> Attached latest patch.

I think what I really want is some logic so that if we have a 1-page
visibility map in the old cluster and the second half of that page is
all zeroes, we only create a 1-page visibility map in the new cluster
rather than a 2-page visibility map.

Or more generally, if the old VM is N pages, but the last half of the
last page is empty, then let the output VM be 2*N-1 pages instead of
2*N pages.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Re: Freeze avoidance of very large table. at 2016-03-10 18:41:55 from Masahiko Sawada

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Michael Paquier	2016-03-10 19:31:55	Re: pg_rewind just doesn't fsync anything?
Previous Message	Tom Lane	2016-03-10 19:16:03	Re: WIP: Upper planner pathification