Re: Freeze avoidance of very large table.

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Jim Nasby <Jim(dot)Nasby(at)bluetreble(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Petr Jelinek <petr(at)2ndquadrant(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Greg Stark <stark(at)mit(dot)edu>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Subject: Re: Freeze avoidance of very large table.
Date: 2015-10-30 00:33:13
Message-ID: CAD21AoC3HqF9Gxfw=+FTvX=CEMJOtxOZjqfTbc9aoCh=LGakbQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Oct 30, 2015 at 1:26 AM, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> On Wed, Oct 28, 2015 at 12:58 PM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>> On Sat, Oct 24, 2015 at 2:24 PM, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
>> wrote:
>>>
>>> On Sat, Oct 24, 2015 at 10:59 AM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
>>> wrote:
>>> >
>>> > I think we can display information about relallfrozen it in
>>> > pg_stat_*_tables
>>> > as suggested by you. It doesn't make much sense to keep it in pg_class
>>> > unless we have some usecase for the same.
>>> >
>>>
>>> I'm thinking a bit about implementing the read-only table that is
>>> restricted to update/delete and is ensured that whole table is frozen,
>>> if this feature is committed.
>>> The value of relallfrozen might be useful for such feature.
>>>
>
> Thank you for reviewing!
>
>> If we need this for read-only table feature, then better lets add that
>> after discussing the design of that feature. It doesn't seem to be
>> advisable to have an extra field in system table which we might
>> need in yet not completely-discussed feature.
>
> I changed it so that the number of frozen pages is stored in
> pg_stat_all_tables as statistics information.
> Also, the tests related to counting all-visible bit and skipping
> vacuum are added to visibility map test, and the test related to
> counting all-frozen is added to stats collector test.
>
> Attached updated v20 patch.
>
>> Review Comments:
>> -------------------------------
>> 1.
>> /*
>> - * Find buffer to insert this tuple into. If the page is all visible,
>> - * this will also pin
>> the requisite visibility map page.
>> + * Find buffer to insert this tuple into. If the page is all
>> visible
>> + * or all frozen, this will also pin the requisite visibility map and
>> + * frozen map page.
>>
>> */
>> buffer = RelationGetBufferForTuple(relation, heaptup->t_len,
>>
>> InvalidBuffer, options, bistate,
>>
>>
>> I think it is sufficient to say in the end 'visibility map page'.
>> Let's not include 'frozen map page'.
>
> Fixed.
>
>>
>> 2.
>> + * corresponding page has been completely frozen, so the visibility map is
>> also
>> + * used for anti-wraparound
>> vacuum, even if freezing tuples is required.
>>
>> /all tuple/all tuples
>> /freezing tuples/freezing of tuples
>
> Fixed.
>
>> 3.
>> - * Are all tuples on heapBlk visible to all, according to the visibility
>> map?
>> + * Are all tuples on heapBlk
>> visible or frozen to all, according to the visibility map?
>>
>> I think it is better to modify the above statement as:
>> Are all tuples on heapBlk visible to all or are marked as frozen, according
>> to the visibility map?
>
> Fixed.
>
>> 4.
>> + * releasing *buf after it's done testing and setting bits, and must set
>> flags
>> + * which indicates what flag
>> we want to test.
>>
>> Here are you talking about the flags passed to visibilitymap_set(), if
>> yes, then above comment is not clear, how about:
>>
>> and must pass flags
>> for which it needs to check the value in visibility map.
>
> Fixed.
>
>> 5.
>> + * both how many pages we skipped according to all-frozen bit of visibility
>> + * map and how many
>> pages we freeze page, so we can update relfrozenxid if
>>
>> In above sentence word 'page' after freeze sounds redundant.
>> /we freeze page/we freeze
>>
>> Another suggestion:
>> /sum of them/sum of two
>
> Fixed.
>
>> 6.
>> + * This block is at least all-visible according to visibility map.
>> +
>> * We check whehter this block is all-frozen or not, to skip to
>>
>> whether is mis-spelled
>
> Fixed.
>
>> 7.
>> + * If we froze any tuples or any tuples are already frozen,
>> + * mark the buffer
>> dirty, and write a WAL record recording the changes.
>>
>> Here, I think WAL record is written only when we mark some
>> tuple/'s as frozen not if we they are already frozen,
>> so in that regard, I think above comment is wrong.
>
> It's wrong.
> Fixed.
>
>> 8.
>> + /*
>> + * We cant't allow upgrading with link mode between 9.5 or before and 9.6
>> or later,
>> + *
>> because the format of visibility map has been changed on version 9.6.
>> + */
>>
>>
>> a. /cant't/can't
>> b. changed on version 9.6/changed in version 9.6
>> b. Won't such a change needs to be updated in pg_upgrade
>> documentation (Notes Section)?
>
> Fixed.
> And updated document.
>
>> 9.
>> @@ -180,6 +181,13 @@ transfer_single_new_db(pageCnvCtx *pageConverter,
>>
>> new_cluster.controldata.cat_ver >= VISIBILITY_MAP_CRASHSAFE_CAT_VER)
>> vm_crashsafe_match = false;
>>
>> +
>> /*
>> + * Do we need to rewrite visibilitymap?
>> + */
>> + if (old_cluster.controldata.cat_ver <
>> VISIBILITY_MAP_FROZEN_BIT_CAT_VER &&
>> + new_cluster.controldata.cat_ver >=
>> VISIBILITY_MAP_FROZEN_BIT_CAT_VER)
>> + vm_rewrite_needed = true;
>>
>> ..
>>
>> @@ -276,7 +285,15 @@ transfer_relfile(pageCnvCtx *pageConverter, FileNameMap
>> *map,
>> {
>>
>> pg_log(PG_VERBOSE, "copying \"%s\" to \"%s\"\n", old_file, new_file);
>>
>> - if ((msg =
>> copyAndUpdateFile(pageConverter, old_file, new_file, true)) != NULL)
>> + /*
>> +
>> * Do we need to rewrite visibilitymap?
>> + */
>> + if (strcmp
>> (type_suffix, "_vm") == 0 &&
>> + old_cluster.controldata.cat_ver <
>> VISIBILITY_MAP_FROZEN_BIT_CAT_VER &&
>> + new_cluster.controldata.cat_ver >=
>> VISIBILITY_MAP_FROZEN_BIT_CAT_VER)
>> + rewrite_vm = true;
>>
>> Instead of doing re-check in transfer_relfile(), I think it is better
>> to pass an additional parameter in this function.
>
> I agree.
> Fixed.
>
>>
>> 10.
>> You have mentioned up-thread that, you have changed the patch so that
>> PageClearAllVisible clear both bits, can you please point me to this
>> change?
>> Basically after applying the patch, I see below code in bufpage.h:
>> #define PageClearAllVisible(page) \
>> (((PageHeader) (page))->pd_flags &= ~PD_ALL_VISIBLE)
>>
>> Don't we need to clear the PD_ALL_FROZEN separately?
>
> Previous patch is wrong. PageClearAllVisible() should be;
> #define PageClearAllVisible(page) \
> (((PageHeader) (page))->pd_flags &= ~(PD_ALL_VISIBLE | PD_ALL_FROZEN))
>
> The all-frozen flag/bit is cleared only by modifying page, so it is
> impossible that only all-frozen flags/bit is cleared.
> The clearing of all-visible flag/bit also means that the page has some
> garbage, and is needed to vacuum.
>

v20 patch has a bug in result of regression test.
Attached updated v21 patch.

Regards,

--
Masahiko Sawada

Attachment Content-Type Size
000_add_frozen_bit_into_visibilitymap_v21.patch text/x-patch 83.7 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Geoghegan 2015-10-30 00:37:46 Re: Are we sufficiently clear that jsonb containment is nested?
Previous Message Tom Lane 2015-10-29 23:03:15 Re: Are we sufficiently clear that jsonb containment is nested?