Re: Freeze avoidance of very large table.

From: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc: Josh Berkus <josh(at)agliodbs(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Jim Nasby <Jim(dot)Nasby(at)bluetreble(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Petr Jelinek <petr(at)2ndquadrant(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Greg Stark <stark(at)mit(dot)edu>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Subject: Re: Freeze avoidance of very large table.
Date: 2015-10-02 15:23:44
Message-ID: 20151002152344.GB2573@alvherre.pgsql
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Masahiko Sawada wrote:

> @@ -2972,10 +2981,15 @@ l1:
> */
> PageSetPrunable(page, xid);
>
> + /* clear PD_ALL_VISIBLE and PD_ALL_FORZEN flags */

Typo "FORZEN".

> if (PageIsAllVisible(page))
> {
> all_visible_cleared = true;
> +
> + /* all-frozen information is also cleared at the same time */
> PageClearAllVisible(page);
> + PageClearAllFrozen(page);

I wonder if it makes sense to have a macro to clear both in unison,
which seems a very common pattern.


> diff --git a/src/backend/access/heap/visibilitymap.c b/src/backend/access/heap/visibilitymap.c
> index 7c38772..a284b85 100644
> --- a/src/backend/access/heap/visibilitymap.c
> +++ b/src/backend/access/heap/visibilitymap.c
> @@ -21,33 +21,45 @@
> *
> * NOTES
> *
> - * The visibility map is a bitmap with one bit per heap page. A set bit means
> - * that all tuples on the page are known visible to all transactions, and
> - * therefore the page doesn't need to be vacuumed. The map is conservative in
> - * the sense that we make sure that whenever a bit is set, we know the
> - * condition is true, but if a bit is not set, it might or might not be true.
> + * The visibility map is a bitmap with two bits (all-visible and all-frozen)
> + * per heap page. A set all-visible bit means that all tuples on the page are
> + * known visible to all transactions, and therefore the page doesn't need to
> + * be vacuumed. A set all-frozen bit means that all tuples on the page are
> + * completely frozen, and therefore the page doesn't need to be vacuumed even
> + * if whole table scanning vacuum is required (e.g. anti-wraparound vacuum).
> + * A all-frozen bit must be set only when the page is already all-visible.
> + * That is, all-frozen bit is always set with all-visible bit.

"A all-frozen" -> "The all-frozen" (but "A set all-xyz" is correct).

> * When we *set* a visibility map during VACUUM, we must write WAL. This may
> * seem counterintuitive, since the bit is basically a hint: if it is clear,
> - * it may still be the case that every tuple on the page is visible to all
> - * transactions; we just don't know that for certain. The difficulty is that
> - * there are two bits which are typically set together: the PD_ALL_VISIBLE bit
> - * on the page itself, and the visibility map bit. If a crash occurs after the
> - * visibility map page makes it to disk and before the updated heap page makes
> - * it to disk, redo must set the bit on the heap page. Otherwise, the next
> - * insert, update, or delete on the heap page will fail to realize that the
> - * visibility map bit must be cleared, possibly causing index-only scans to
> - * return wrong answers.
> + * it may still be the case that every tuple on the page is visible or frozen
> + * to all transactions; we just don't know that for certain. The difficulty is
> + * that there are two bits which are typically set together: the PD_ALL_VISIBLE
> + * or PD_ALL_FROZEN bit on the page itself, and the visibility map bit. If a
> + * crash occurs after the visibility map page makes it to disk and before the
> + * updated heap page makes it to disk, redo must set the bit on the heap page.
> + * Otherwise, the next insert, update, or delete on the heap page will fail to
> + * realize that the visibility map bit must be cleared, possibly causing index-only
> + * scans to return wrong answers.

In the "The difficulty ..." para, I would add the word "corresponding" before
"visibility". Otherwise, it is not clear what the plural means exactly.

> * VACUUM will normally skip pages for which the visibility map bit is set;
> * such pages can't contain any dead tuples and therefore don't need vacuuming.
> - * The visibility map is not used for anti-wraparound vacuums, because
> + * The visibility map is not used for anti-wraparound vacuums before 9.5, because
> * an anti-wraparound vacuum needs to freeze tuples and observe the latest xid
> * present in the table, even on pages that don't have any dead tuples.
> + * 9.6 or later, the visibility map has a additional bit which indicates all tuple
> + * on single page has been completely forzen, so the visibility map is also used for
> + * anti-wraparound vacuums.

This should not mention database versions. Just explain how the code
behaves today, not how it behaved in the past. Those who want to
understand how it behaved in 9.5 can read the 9.5 code. (Again typo
"forzen".)

> @@ -1115,6 +1187,14 @@ lazy_scan_heap(Relation onerel, LVRelStats *vacrelstats,
> tups_vacuumed, vacuumed_pages)));
>
> /*
> + * This information would be effective for how much effect all-frozen bit
> + * of VM had for freezing tuples.
> + */
> + ereport(elevel,
> + (errmsg("Skipped %d frozen pages acoording to visibility map",
> + vacrelstats->vmskipped_frozen_pages)));

Message must start on lowercase letter. I don't understand what the
comment means. Can you rephrase it?

> @@ -1779,10 +1873,12 @@ vac_cmp_itemptr(const void *left, const void *right)
> /*
> * Check if every tuple in the given page is visible to all current and future
> * transactions. Also return the visibility_cutoff_xid which is the highest
> - * xmin amongst the visible tuples.
> + * xmin amongst the visible tuples, and all_forzen which implies that all tuples
> + * of this page are frozen.

Typo "forzen" here again.

> @@ -201,6 +239,110 @@ copy_file(const char *srcfile, const char *dstfile, bool force)
> #endif
>
>
> +/*
> + * rewriteVisibilitymap()
> + *
> + * A additional bit which indicates that all tuples on page is completely
> + * frozen is added into visibility map at PG 9.6. So the format of visibiilty
> + * map has been changed.
> + * Copies a visibility map file while adding all-frozen bit(0) into each bit.
> + */
> +static const char *
> +rewriteVisibilitymap(const char *fromfile, const char *tofile, bool force)
> +{
> +#define REWRITE_BUF_SIZE (50 * BLCKSZ)
> +#define BITS_PER_HEAPBLOCK 2
> +
> + int src_fd, dst_fd;
> + uint16 vm_bits;
> + ssize_t nbytes;
> + char *buffer;
> + int ret = 0;
> + int save_errno = 0;
> +
> + if ((fromfile == NULL) || (tofile == NULL))
> + {
> + errno = EINVAL;
> + return getErrorText(errno);
> + }
> +
> + if ((src_fd = open(fromfile, O_RDONLY, 0)) < 0)
> + return getErrorText(errno);
> +
> + if ((dst_fd = open(tofile, O_RDWR | O_CREAT | (force ? 0 : O_EXCL), S_IRUSR | S_IWUSR)) < 0)
> + {
> + save_errno = errno;
> + if (src_fd != 0)
> + close(src_fd);
> +
> + errno = save_errno;
> + return getErrorText(errno);
> + }
> +
> + buffer = (char *) pg_malloc(REWRITE_BUF_SIZE);
> +
> + /* Copy page header data in advance */
> + if ((nbytes = read(src_fd, buffer, MAXALIGN(SizeOfPageHeaderData))) <= 0)
> + {
> + save_errno = errno;
> + return getErrorText(errno);
> + }

Not clear why you bother with save_errno in this path. Forgot to
close()? (Though I wonder why you bother to close() if the program is
going to exit shortly thereafter anyway.)

> diff --git a/src/bin/pg_upgrade/pg_upgrade.h b/src/bin/pg_upgrade/pg_upgrade.h
> index 13aa891..fc92a5f 100644
> --- a/src/bin/pg_upgrade/pg_upgrade.h
> +++ b/src/bin/pg_upgrade/pg_upgrade.h
> @@ -112,6 +112,11 @@ extern char *output_files[];
> #define VISIBILITY_MAP_CRASHSAFE_CAT_VER 201107031
>
> /*
> + * The format of visibility map changed with this 9.6 commit,
> + *
> + */
> +#define VISIBILITY_MAP_FROZEN_BIT_CAT_VER 201509181

Useless empty line in comment.

> diff --git a/src/common/relpath.c b/src/common/relpath.c
> index 66dfef1..52ff14e 100644
> --- a/src/common/relpath.c
> +++ b/src/common/relpath.c
> @@ -30,6 +30,9 @@
> * If you add a new entry, remember to update the errhint in
> * forkname_to_number() below, and update the SGML documentation for
> * pg_relation_size().
> + * 9.6 or later, the visibility map fork name is changed from "vm" to
> + * "vfm" bacause visibility map has not only information about all-visible
> + * but also information about all-frozen.
> */
> const char *const forkNames[] = {
> "main", /* MAIN_FORKNUM */

Drop the change in comment? There's no "vfm" in this version of the
patch, is there?

--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Joshua D. Drake 2015-10-02 16:28:02 Request for dogfood volunteers (was No Issue Tracker - Say it Ain't So!)
Previous Message Bruce Momjian 2015-10-02 15:21:11 Re: What is the extent of FDW join pushdown support in 9.5?