From: | Robert Haas <robertmhaas(at)gmail(dot)com> |
---|---|
To: | Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> |
Cc: | Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Jim Nasby <Jim(dot)Nasby(at)bluetreble(dot)com>, Peter Geoghegan <pg(at)heroku(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Petr Jelinek <petr(at)2ndquadrant(dot)com>, Greg S <stark(at)mit(dot)edu>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Freeze avoidance of very large table. |
Date: | 2016-03-10 19:27:55 |
Message-ID: | CA+TgmoYJgy+-KXKTOUYDmApW4VZ1aAbRQrRhpzDx4=55Vr7EAg@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Thu, Mar 10, 2016 at 1:41 PM, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> On Fri, Mar 11, 2016 at 1:03 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>> This 001 patch looks so little like what I was expecting that I
>> decided to start over from scratch. The new version I wrote is
>> attached here. I don't understand why your version tinkers with the
>> logic for setting the all-frozen bit; I thought that what I already
>> committed dealt with that already, and in any case, your version
>> doesn't even compile against latest sources. Your version also leaves
>> the scan_all terminology intact even though it's not accurate any
>> more, and I am not very convinced that the updates to the
>> page-skipping logic are actually correct. Please have a look over
>> this version and see what you think.
>
> Thank you for your advise.
> Sorry, optimising logic of previous patch was old by mistake.
> Attached latest patch incorporated your suggestions with a little revising.
OK, I'll have a look. Thanks.
>> I think that's kind of pointless. We need to test that this
>> conversion code works, but once it does, I don't think we should make
>> everybody pay the overhead of retesting that. Anyway, the test code
>> could have bugs, too.
>>
>> Here's an updated version of your patch with that code removed and
>> some cosmetic cleanups like fixing typos and stuff like that. I think
>> this is mostly ready to commit, but I noticed one problem: your
>> conversion code always produces two output pages for each input page
>> even if one of them would be empty. In particular, if you have a
>> large number of small relations and run pg_upgrade, all of their
>> visibility maps will go from 8kB to 16kB. That isn't the end of the
>> world, maybe, but I think you should see if you can't fix it
>> somehow....
>
> Thank you for updating patch.
> To deal with this problem, I've changed it so that pg_upgrade checks
> file size before conversion.
> And if fork file does not exist or size is 0 (empty), ignore.
> Attached latest patch.
I think what I really want is some logic so that if we have a 1-page
visibility map in the old cluster and the second half of that page is
all zeroes, we only create a 1-page visibility map in the new cluster
rather than a 2-page visibility map.
Or more generally, if the old VM is N pages, but the last half of the
last page is empty, then let the output VM be 2*N-1 pages instead of
2*N pages.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
From | Date | Subject | |
---|---|---|---|
Next Message | Michael Paquier | 2016-03-10 19:31:55 | Re: pg_rewind just doesn't fsync *anything*? |
Previous Message | Tom Lane | 2016-03-10 19:16:03 | Re: WIP: Upper planner pathification |