Re: Freeze avoidance of very large table.

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Jim Nasby <Jim(dot)Nasby(at)bluetreble(dot)com>, Petr Jelinek <petr(at)2ndquadrant(dot)com>, Greg Stark <stark(at)mit(dot)edu>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Freeze avoidance of very large table.
Date: 2015-12-17 18:17:47
Message-ID: CAD21AoBScUD4k_QWrYGRmbXVruiekPY=2BY2Fxhqq55a+tzUxg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Dec 17, 2015 at 11:47 AM, Michael Paquier
<michael(dot)paquier(at)gmail(dot)com> wrote:
> On Thu, Dec 10, 2015 at 3:31 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>> On Mon, Nov 30, 2015 at 12:58 PM, Bruce Momjian <bruce(at)momjian(dot)us> wrote:
>>> On Mon, Nov 30, 2015 at 10:48:04PM +0530, Masahiko Sawada wrote:
>>>> On Sun, Nov 29, 2015 at 2:21 AM, Jeff Janes <jeff(dot)janes(at)gmail(dot)com> wrote:
>>>> > On Tue, Nov 24, 2015 at 3:13 PM, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>>>> >>
>>>> >> Yeah, we need to consider to compute checksum if enabled.
>>>> >> I've changed the patch, and attached.
>>>> >> Please review it.
>>>> >
>>>> > Thanks for the update. This now conflicts with the updates doesn to
>>>> > fix pg_upgrade out-of-space issue on Windows. I've fixed (I think) the
>>>> > conflict in order to do some testing, but I'd like to get an updated
>>>> > patch from the author in case I did it wrong. I don't want to find
>>>> > bugs that I just introduced myself.
>>>> >
>>>>
>>>> Thank you for having a look.
>>>
>>> I would not bother mentioning this detail in the pg_upgrade manual page:
>>>
>>> + Since the format of visibility map has been changed in version 9.6,
>>> + <application>pg_upgrade</> creates and rewrite new <literal>'_vm'</literal>
>>> + file even if upgrading from 9.5 or before to 9.6 or later with link mode (-k).
>>
>> Really? I know we don't always document things like this, but it
>> seems like a good idea to me that we do so.
>
> Just going though v30...
>
> + frozen. The whole-table freezing is occuerred only when all pages happen to
> + require freezing to freeze rows. In other cases such as where
>
> I am not really getting the meaning of this sentence. Shouldn't this
> be reworded something like:
> "Freezing occurs on the whole table once all pages of this relation require it."
>
> + <structfield>relfrozenxid</> is more than
> <varname>vacuum_freeze_table_age</>
> + transcations old, where <command>VACUUM</>'s <literal>FREEZE</>
> option is used,
> + <command>VACUUM</> can skip the pages that all tuples on the page
> itself are
> + marked as frozen.
> + When all pages of table are eventually marked as frozen by
> <command>VACUUM</>,
> + after it's finished <literal>age(relfrozenxid)</> should be a little more
> + than the <varname>vacuum_freeze_min_age</> setting that was used (more by
> + the number of transcations started since the <command>VACUUM</> started).
> + If the advancing of <structfield>relfrozenxid</> is not happend until
> + <varname>autovacuum_freeze_max_age</> is reached, an autovacuum will soon
> + be forced for the table.
>
> s/transcations/transactions.
>
> + <entry><structfield>n_frozen_page</></entry>
> + <entry><type>integer</></entry>
> + <entry>Number of frozen pages</entry>
> n_frozen_pages?
>
> make check with pg_upgrade is failing for me:
> Visibility map rewriting test failed
> make: *** [check] Error 1

make check with pg_upgrade is done successfully on my environment.
Could you give me more information about this?

> -GetVisibilityMapPins(Relation relation, Buffer buffer1, Buffer buffer2,
> +GetVisibilitymapPins(Relation relation, Buffer buffer1, Buffer buffer2,
> This looks like an unrelated change.
>
> * Clearing a visibility map bit is not separately WAL-logged. The callers
> * must make sure that whenever a bit is cleared, the bit is cleared on WAL
> - * replay of the updating operation as well.
> + * replay of the updating operation as well. And all-frozen bit must be
> + * cleared with all-visible at the same time.
> This could be reformulated. This is just an addition on top of the
> existing paragraph.
>
> + * The visibility map has the all-frozen bit which indicates all tuples on
> + * corresponding page has been completely frozen, so the visibility map is also
> "have been completely frozen".
>
> -/* Mapping from heap block number to the right bit in the visibility map */
> -#define HEAPBLK_TO_MAPBLOCK(x) ((x) / HEAPBLOCKS_PER_PAGE)
> -#define HEAPBLK_TO_MAPBYTE(x) (((x) % HEAPBLOCKS_PER_PAGE) /
> HEAPBLOCKS_PER_BYTE)
> Those two declarations are just noise in the patch: those definitions
> are unchanged.
>
> - elog(DEBUG1, "vm_clear %s %d", RelationGetRelationName(rel), heapBlk);
> + elog(DEBUG1, "vm_clear %s block %d",
> RelationGetRelationName(rel), heapBlk);
> This may be better as a separate patch.

I've attached 001 patch for this separately.

>
> -visibilitymap_count(Relation rel)
> +visibilitymap_count(Relation rel, BlockNumber *all_frozen)
> I think that this routine would gain in clarity if reworked as follows:
> visibilitymap_count(Relation rel, BlockNumber *all_visible,
> BlockNumber *all_frozen)
>
> + /*
> + * Report ANALYZE to the stats collector, too.
> However, if doing
> + * inherited stats we shouldn't report, because the
> stats collector only
> + * tracks per-table stats.
> + */
> + if (!inh)
> + pgstat_report_analyze(onerel, totalrows,
> totaldeadrows, relallfrozen);
> Here we already know that this is working in the non-inherited code
> path. As a whole, why that? Why isn't relallfrozen passed as an
> argument of vac_update_relstats and then inserted in pg_class? Maybe I
> am missing something..

IIUC, as per discussion, the number of frozen pages should not be
inserted into pg_class. Because it's not information used by query
planning like relallvisible, repages.

> + * mxid full-table scan limit. During full scan, we could skip some pags
> + * according to all-frozen bit of visibility map.
> s/pags/pages
>
> + * Also, skipping even a single page accorinding to all-visible bit of
> s/accorinding/according.
>
> So, lazy_scan_heap is the central and really vital portion of the patch...
>
> + /* Check whether this tuple is alrady
> frozen or not */
> s/alrady/already
>
> -heap_page_is_all_visible(Relation rel, Buffer buf, TransactionId
> *visibility_cutoff_xid)
> +heap_page_is_all_visible(Relation rel, Buffer buf, TransactionId
> *visibility_cutoff_xid,
> + bool *all_frozen)
> I think you would want to change that to heap_page_visible_status that
> returns *all_visible as well. At least it seems to be a more
> consistent style of routine.
>
> + * The format of visibility map is changed with this 9.6 commit,
> + */
> +#define VISIBILITY_MAP_FROZEN_BIT_CAT_VER 201512021
> It looks a bit strange to have a different flag for the vm with the
> new frozen bit. Couldn't we merge that into a unique version number? I
> guess that we should just ask for a vm rewrite anyway in any case if
> pg_upgrade is used on the version of pg with the new vm format, no?

Thank you for your review.
Please find the attached latest v31 patches.

>
> Sawada-san, are you planning to continue working on that? At this
> stage it seems that this patch is not in commitable state and should
> at best be moved to next CF, or at worst returned with feedback.

Yes, of course.
This patch should be marked as "Move to next CF".

Regards,

--
Masahiko Sawada

Attachment Content-Type Size
000_add_frozen_bit_into_visibilitymap_v31.patch application/octet-stream 88.2 KB
001_enhance_visibilitymap_debug_messages_v1.patch application/octet-stream 1.6 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2015-12-17 18:20:23 Re: Fwd: Cluster "stuck" in "not accepting commands to avoid wraparound data loss"
Previous Message Robert Haas 2015-12-17 18:12:24 Re: Inaccurate results from numeric ln(), log(), exp() and pow()