Re: Freeze avoidance of very large table.

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Jim Nasby <Jim(dot)Nasby(at)bluetreble(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Petr Jelinek <petr(at)2ndquadrant(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Greg Stark <stark(at)mit(dot)edu>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Subject: Re: Freeze avoidance of very large table.
Date: 2015-10-28 03:58:47
Message-ID: CAA4eK1JsdAwjMFj7D=auiC6R24HbeeEDim_mJmRYjtTxU93Cwg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Oct 24, 2015 at 2:24 PM, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
wrote:
>
> On Sat, Oct 24, 2015 at 10:59 AM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
wrote:
> >
> > I think we can display information about relallfrozen it in
pg_stat_*_tables
> > as suggested by you. It doesn't make much sense to keep it in pg_class
> > unless we have some usecase for the same.
> >
>
> I'm thinking a bit about implementing the read-only table that is
> restricted to update/delete and is ensured that whole table is frozen,
> if this feature is committed.
> The value of relallfrozen might be useful for such feature.
>

If we need this for read-only table feature, then better lets add that
after discussing the design of that feature. It doesn't seem to be
advisable to have an extra field in system table which we might
need in yet not completely-discussed feature.

Review Comments:
-------------------------------
1.
/*
- * Find buffer to insert this tuple into. If the page is all visible,
- * this will also pin
the requisite visibility map page.
+ * Find buffer to insert this tuple into. If the page is all
visible
+ * or all frozen, this will also pin the requisite visibility map and
+ * frozen map page.

*/
buffer = RelationGetBufferForTuple(relation, heaptup->t_len,

InvalidBuffer, options, bistate,

I think it is sufficient to say in the end 'visibility map page'.
Let's not include 'frozen map page'.

2.
+ * corresponding page has been completely frozen, so the visibility map is
also
+ * used for anti-wraparound
vacuum, even if freezing tuples is required.

/all tuple/all tuples
/freezing tuples/freezing of tuples

3.
- * Are all tuples on heapBlk visible to all, according to the visibility
map?
+ * Are all tuples on heapBlk
visible or frozen to all, according to the visibility map?

I think it is better to modify the above statement as:
Are all tuples on heapBlk visible to all or are marked as frozen, according
to the visibility map?

4.
+ * releasing *buf after it's done testing and setting bits, and must set
flags
+ * which indicates what flag
we want to test.

Here are you talking about the flags passed to visibilitymap_set(), if
yes, then above comment is not clear, how about:

and must pass flags
for which it needs to check the value in visibility map.

5.
+ * both how many pages we skipped according to all-frozen bit of visibility
+ * map and how many
pages we freeze page, so we can update relfrozenxid if

In above sentence word 'page' after freeze sounds redundant.
/we freeze page/we freeze

Another suggestion:
/sum of them/sum of two

6.
+ * This block is at least all-visible according to visibility map.
+
* We check whehter this block is all-frozen or not, to skip to

whether is mis-spelled

7.
+ * If we froze any tuples or any tuples are already frozen,
+ * mark the buffer
dirty, and write a WAL record recording the changes.

Here, I think WAL record is written only when we mark some
tuple/'s as frozen not if we they are already frozen,
so in that regard, I think above comment is wrong.

8.
+ /*
+ * We cant't allow upgrading with link mode between 9.5 or before and 9.6
or later,
+ *
because the format of visibility map has been changed on version 9.6.
+ */

a. /cant't/can't
b. changed on version 9.6/changed in version 9.6
b. Won't such a change needs to be updated in pg_upgrade
documentation (Notes Section)?

9.
@@ -180,6 +181,13 @@ transfer_single_new_db(pageCnvCtx *pageConverter,

new_cluster.controldata.cat_ver >= VISIBILITY_MAP_CRASHSAFE_CAT_VER)
vm_crashsafe_match = false;

+
/*
+ * Do we need to rewrite visibilitymap?
+ */
+ if (old_cluster.controldata.cat_ver <
VISIBILITY_MAP_FROZEN_BIT_CAT_VER &&
+ new_cluster.controldata.cat_ver >=
VISIBILITY_MAP_FROZEN_BIT_CAT_VER)
+ vm_rewrite_needed = true;

..

@@ -276,7 +285,15 @@ transfer_relfile(pageCnvCtx *pageConverter,
FileNameMap *map,
{

pg_log(PG_VERBOSE, "copying \"%s\" to \"%s\"\n", old_file, new_file);

- if ((msg =
copyAndUpdateFile(pageConverter, old_file, new_file, true)) != NULL)
+ /*
+
* Do we need to rewrite visibilitymap?
+ */
+ if (strcmp
(type_suffix, "_vm") == 0 &&
+ old_cluster.controldata.cat_ver <
VISIBILITY_MAP_FROZEN_BIT_CAT_VER &&
+ new_cluster.controldata.cat_ver >=
VISIBILITY_MAP_FROZEN_BIT_CAT_VER)
+ rewrite_vm = true;

Instead of doing re-check in transfer_relfile(), I think it is better
to pass an additional parameter in this function.

10.
You have mentioned up-thread that, you have changed the patch so that
PageClearAllVisible clear both bits, can you please point me to this
change?
Basically after applying the patch, I see below code in bufpage.h:
#define PageClearAllVisible(page) \
(((PageHeader) (page))->pd_flags &= ~PD_ALL_VISIBLE)

Don't we need to clear the PD_ALL_FROZEN separately?

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Haribabu Kommi 2015-10-28 05:07:20 Re: BUG #13741: vacuumdb does not accept valid password
Previous Message Muthiah Rajan 2015-10-28 03:57:25 Re: Disabling START TRANSACTION for a SuperUser