Re: Freeze avoidance of very large table.

From: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Jim Nasby <Jim(dot)Nasby(at)bluetreble(dot)com>, Petr Jelinek <petr(at)2ndquadrant(dot)com>, Greg Stark <stark(at)mit(dot)edu>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Freeze avoidance of very large table.
Date: 2015-12-17 06:17:03
Message-ID: CAB7nPqQkZ+QXesZMKVLcGfgUcbzXqqqRSmTPJ83Yy9e_q0K08A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Dec 10, 2015 at 3:31 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Mon, Nov 30, 2015 at 12:58 PM, Bruce Momjian <bruce(at)momjian(dot)us> wrote:
>> On Mon, Nov 30, 2015 at 10:48:04PM +0530, Masahiko Sawada wrote:
>>> On Sun, Nov 29, 2015 at 2:21 AM, Jeff Janes <jeff(dot)janes(at)gmail(dot)com> wrote:
>>> > On Tue, Nov 24, 2015 at 3:13 PM, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>>> >>
>>> >> Yeah, we need to consider to compute checksum if enabled.
>>> >> I've changed the patch, and attached.
>>> >> Please review it.
>>> >
>>> > Thanks for the update. This now conflicts with the updates doesn to
>>> > fix pg_upgrade out-of-space issue on Windows. I've fixed (I think) the
>>> > conflict in order to do some testing, but I'd like to get an updated
>>> > patch from the author in case I did it wrong. I don't want to find
>>> > bugs that I just introduced myself.
>>> >
>>>
>>> Thank you for having a look.
>>
>> I would not bother mentioning this detail in the pg_upgrade manual page:
>>
>> + Since the format of visibility map has been changed in version 9.6,
>> + <application>pg_upgrade</> creates and rewrite new <literal>'_vm'</literal>
>> + file even if upgrading from 9.5 or before to 9.6 or later with link mode (-k).
>
> Really? I know we don't always document things like this, but it
> seems like a good idea to me that we do so.

Just going though v30...

+ frozen. The whole-table freezing is occuerred only when all pages happen to
+ require freezing to freeze rows. In other cases such as where

I am not really getting the meaning of this sentence. Shouldn't this
be reworded something like:
"Freezing occurs on the whole table once all pages of this relation require it."

+ <structfield>relfrozenxid</> is more than
<varname>vacuum_freeze_table_age</>
+ transcations old, where <command>VACUUM</>'s <literal>FREEZE</>
option is used,
+ <command>VACUUM</> can skip the pages that all tuples on the page
itself are
+ marked as frozen.
+ When all pages of table are eventually marked as frozen by
<command>VACUUM</>,
+ after it's finished <literal>age(relfrozenxid)</> should be a little more
+ than the <varname>vacuum_freeze_min_age</> setting that was used (more by
+ the number of transcations started since the <command>VACUUM</> started).
+ If the advancing of <structfield>relfrozenxid</> is not happend until
+ <varname>autovacuum_freeze_max_age</> is reached, an autovacuum will soon
+ be forced for the table.

s/transcations/transactions.

+ <entry><structfield>n_frozen_page</></entry>
+ <entry><type>integer</></entry>
+ <entry>Number of frozen pages</entry>
n_frozen_pages?

make check with pg_upgrade is failing for me:
Visibility map rewriting test failed
make: *** [check] Error 1

-GetVisibilityMapPins(Relation relation, Buffer buffer1, Buffer buffer2,
+GetVisibilitymapPins(Relation relation, Buffer buffer1, Buffer buffer2,
This looks like an unrelated change.

* Clearing a visibility map bit is not separately WAL-logged. The callers
* must make sure that whenever a bit is cleared, the bit is cleared on WAL
- * replay of the updating operation as well.
+ * replay of the updating operation as well. And all-frozen bit must be
+ * cleared with all-visible at the same time.
This could be reformulated. This is just an addition on top of the
existing paragraph.

+ * The visibility map has the all-frozen bit which indicates all tuples on
+ * corresponding page has been completely frozen, so the visibility map is also
"have been completely frozen".

-/* Mapping from heap block number to the right bit in the visibility map */
-#define HEAPBLK_TO_MAPBLOCK(x) ((x) / HEAPBLOCKS_PER_PAGE)
-#define HEAPBLK_TO_MAPBYTE(x) (((x) % HEAPBLOCKS_PER_PAGE) /
HEAPBLOCKS_PER_BYTE)
Those two declarations are just noise in the patch: those definitions
are unchanged.

- elog(DEBUG1, "vm_clear %s %d", RelationGetRelationName(rel), heapBlk);
+ elog(DEBUG1, "vm_clear %s block %d",
RelationGetRelationName(rel), heapBlk);
This may be better as a separate patch.

-visibilitymap_count(Relation rel)
+visibilitymap_count(Relation rel, BlockNumber *all_frozen)
I think that this routine would gain in clarity if reworked as follows:
visibilitymap_count(Relation rel, BlockNumber *all_visible,
BlockNumber *all_frozen)

+ /*
+ * Report ANALYZE to the stats collector, too.
However, if doing
+ * inherited stats we shouldn't report, because the
stats collector only
+ * tracks per-table stats.
+ */
+ if (!inh)
+ pgstat_report_analyze(onerel, totalrows,
totaldeadrows, relallfrozen);
Here we already know that this is working in the non-inherited code
path. As a whole, why that? Why isn't relallfrozen passed as an
argument of vac_update_relstats and then inserted in pg_class? Maybe I
am missing something..

+ * mxid full-table scan limit. During full scan, we could skip some pags
+ * according to all-frozen bit of visibility map.
s/pags/pages

+ * Also, skipping even a single page accorinding to all-visible bit of
s/accorinding/according.

So, lazy_scan_heap is the central and really vital portion of the patch...

+ /* Check whether this tuple is alrady
frozen or not */
s/alrady/already

-heap_page_is_all_visible(Relation rel, Buffer buf, TransactionId
*visibility_cutoff_xid)
+heap_page_is_all_visible(Relation rel, Buffer buf, TransactionId
*visibility_cutoff_xid,
+ bool *all_frozen)
I think you would want to change that to heap_page_visible_status that
returns *all_visible as well. At least it seems to be a more
consistent style of routine.

+ * The format of visibility map is changed with this 9.6 commit,
+ */
+#define VISIBILITY_MAP_FROZEN_BIT_CAT_VER 201512021
It looks a bit strange to have a different flag for the vm with the
new frozen bit. Couldn't we merge that into a unique version number? I
guess that we should just ask for a vm rewrite anyway in any case if
pg_upgrade is used on the version of pg with the new vm format, no?

Sawada-san, are you planning to continue working on that? At this
stage it seems that this patch is not in commitable state and should
at best be moved to next CF, or at worst returned with feedback.
--
Michael

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2015-12-17 06:24:52 Re: Making tab-complete.c easier to maintain
Previous Message Simon Riggs 2015-12-17 06:11:33 Re: Performance improvement for joins where outer side is unique