Re: Freeze avoidance of very large table.

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Jim Nasby <Jim(dot)Nasby(at)bluetreble(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Petr Jelinek <petr(at)2ndquadrant(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Greg Stark <stark(at)mit(dot)edu>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Subject: Re: Freeze avoidance of very large table.
Date: 2015-10-29 16:26:06
Message-ID: CAD21AoCadYViiR6Q-fHKXxT4PVUGiVxam9G-6tmFVx4MxxQ0Ug@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Oct 28, 2015 at 12:58 PM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> On Sat, Oct 24, 2015 at 2:24 PM, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
> wrote:
>>
>> On Sat, Oct 24, 2015 at 10:59 AM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
>> wrote:
>> >
>> > I think we can display information about relallfrozen it in
>> > pg_stat_*_tables
>> > as suggested by you. It doesn't make much sense to keep it in pg_class
>> > unless we have some usecase for the same.
>> >
>>
>> I'm thinking a bit about implementing the read-only table that is
>> restricted to update/delete and is ensured that whole table is frozen,
>> if this feature is committed.
>> The value of relallfrozen might be useful for such feature.
>>

Thank you for reviewing!

> If we need this for read-only table feature, then better lets add that
> after discussing the design of that feature. It doesn't seem to be
> advisable to have an extra field in system table which we might
> need in yet not completely-discussed feature.

I changed it so that the number of frozen pages is stored in
pg_stat_all_tables as statistics information.
Also, the tests related to counting all-visible bit and skipping
vacuum are added to visibility map test, and the test related to
counting all-frozen is added to stats collector test.

Attached updated v20 patch.

> Review Comments:
> -------------------------------
> 1.
> /*
> - * Find buffer to insert this tuple into. If the page is all visible,
> - * this will also pin
> the requisite visibility map page.
> + * Find buffer to insert this tuple into. If the page is all
> visible
> + * or all frozen, this will also pin the requisite visibility map and
> + * frozen map page.
>
> */
> buffer = RelationGetBufferForTuple(relation, heaptup->t_len,
>
> InvalidBuffer, options, bistate,
>
>
> I think it is sufficient to say in the end 'visibility map page'.
> Let's not include 'frozen map page'.

Fixed.

>
> 2.
> + * corresponding page has been completely frozen, so the visibility map is
> also
> + * used for anti-wraparound
> vacuum, even if freezing tuples is required.
>
> /all tuple/all tuples
> /freezing tuples/freezing of tuples

Fixed.

> 3.
> - * Are all tuples on heapBlk visible to all, according to the visibility
> map?
> + * Are all tuples on heapBlk
> visible or frozen to all, according to the visibility map?
>
> I think it is better to modify the above statement as:
> Are all tuples on heapBlk visible to all or are marked as frozen, according
> to the visibility map?

Fixed.

> 4.
> + * releasing *buf after it's done testing and setting bits, and must set
> flags
> + * which indicates what flag
> we want to test.
>
> Here are you talking about the flags passed to visibilitymap_set(), if
> yes, then above comment is not clear, how about:
>
> and must pass flags
> for which it needs to check the value in visibility map.

Fixed.

> 5.
> + * both how many pages we skipped according to all-frozen bit of visibility
> + * map and how many
> pages we freeze page, so we can update relfrozenxid if
>
> In above sentence word 'page' after freeze sounds redundant.
> /we freeze page/we freeze
>
> Another suggestion:
> /sum of them/sum of two

Fixed.

> 6.
> + * This block is at least all-visible according to visibility map.
> +
> * We check whehter this block is all-frozen or not, to skip to
>
> whether is mis-spelled

Fixed.

> 7.
> + * If we froze any tuples or any tuples are already frozen,
> + * mark the buffer
> dirty, and write a WAL record recording the changes.
>
> Here, I think WAL record is written only when we mark some
> tuple/'s as frozen not if we they are already frozen,
> so in that regard, I think above comment is wrong.

It's wrong.
Fixed.

> 8.
> + /*
> + * We cant't allow upgrading with link mode between 9.5 or before and 9.6
> or later,
> + *
> because the format of visibility map has been changed on version 9.6.
> + */
>
>
> a. /cant't/can't
> b. changed on version 9.6/changed in version 9.6
> b. Won't such a change needs to be updated in pg_upgrade
> documentation (Notes Section)?

Fixed.
And updated document.

> 9.
> @@ -180,6 +181,13 @@ transfer_single_new_db(pageCnvCtx *pageConverter,
>
> new_cluster.controldata.cat_ver >= VISIBILITY_MAP_CRASHSAFE_CAT_VER)
> vm_crashsafe_match = false;
>
> +
> /*
> + * Do we need to rewrite visibilitymap?
> + */
> + if (old_cluster.controldata.cat_ver <
> VISIBILITY_MAP_FROZEN_BIT_CAT_VER &&
> + new_cluster.controldata.cat_ver >=
> VISIBILITY_MAP_FROZEN_BIT_CAT_VER)
> + vm_rewrite_needed = true;
>
> ..
>
> @@ -276,7 +285,15 @@ transfer_relfile(pageCnvCtx *pageConverter, FileNameMap
> *map,
> {
>
> pg_log(PG_VERBOSE, "copying \"%s\" to \"%s\"\n", old_file, new_file);
>
> - if ((msg =
> copyAndUpdateFile(pageConverter, old_file, new_file, true)) != NULL)
> + /*
> +
> * Do we need to rewrite visibilitymap?
> + */
> + if (strcmp
> (type_suffix, "_vm") == 0 &&
> + old_cluster.controldata.cat_ver <
> VISIBILITY_MAP_FROZEN_BIT_CAT_VER &&
> + new_cluster.controldata.cat_ver >=
> VISIBILITY_MAP_FROZEN_BIT_CAT_VER)
> + rewrite_vm = true;
>
> Instead of doing re-check in transfer_relfile(), I think it is better
> to pass an additional parameter in this function.

I agree.
Fixed.

>
> 10.
> You have mentioned up-thread that, you have changed the patch so that
> PageClearAllVisible clear both bits, can you please point me to this
> change?
> Basically after applying the patch, I see below code in bufpage.h:
> #define PageClearAllVisible(page) \
> (((PageHeader) (page))->pd_flags &= ~PD_ALL_VISIBLE)
>
> Don't we need to clear the PD_ALL_FROZEN separately?

Previous patch is wrong. PageClearAllVisible() should be;
#define PageClearAllVisible(page) \
(((PageHeader) (page))->pd_flags &= ~(PD_ALL_VISIBLE | PD_ALL_FROZEN))

The all-frozen flag/bit is cleared only by modifying page, so it is
impossible that only all-frozen flags/bit is cleared.
The clearing of all-visible flag/bit also means that the page has some
garbage, and is needed to vacuum.

Regards,

--
Masahiko Sawada

Attachment Content-Type Size
000_add_frozen_bit_into_visibilitymap_v20.patch text/x-patch 83.7 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Colin 't Hart 2015-10-29 16:31:04 Did the "Full-text search in PostgreSQL in milliseconds" patches land?
Previous Message Jeff Janes 2015-10-29 16:22:15 Re: pg_dump