Re: Freeze avoidance of very large table.

From: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To: amit(dot)kapila16(at)gmail(dot)com
Cc: sawada(dot)mshk(at)gmail(dot)com, masao(dot)fujii(at)gmail(dot)com, josh(at)agliodbs(dot)com, robertmhaas(at)gmail(dot)com, alvherre(at)2ndquadrant(dot)com, Jim(dot)Nasby(at)bluetreble(dot)com, michael(dot)paquier(at)gmail(dot)com, petr(at)2ndquadrant(dot)com, bruce(at)momjian(dot)us, simon(at)2ndquadrant(dot)com, andres(at)anarazel(dot)de, stark(at)mit(dot)edu, pgsql-hackers(at)postgresql(dot)org, jeff(dot)janes(at)gmail(dot)com
Subject: Re: Freeze avoidance of very large table.
Date: 2015-11-05 09:03:29
Message-ID: 20151105.180329.151649663.horiguchi.kyotaro@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello, I had a look on v21 patch.

Though I haven't looked the whole of the patch, I'd like to show
you some comments only for visibilitymap.c and a part of the
documentation.

1. Patch application

patch command complains about offsets for heapam.c at current
master.

2. visitibilymap_test()

- if (visibilitymap_test(rel, blkno, &vmbuffer))
+ if (visibilitymap_test(rel, blkno, &vmbuffer, VISIBILITYMAP_ALL_VISIBLE)

The old VM was a simple bitmap so the name _test and the
function are proper but now the bitmap is quad state so it'd be
better chainging the function. Alghough it is not so expensive
to call it twice successively, it is a bit uneasy for me doing
so. One possible shape would be like the following.

lazy_vacuum_page()
> int vmstate = visibilitymap_get_status(rel, blkno, &vmbuffer);
> if (!(vmstate & VISIBILITYMAP_ALL_VISIBLE))
> ...
> if (all_frozen && !(vmstate & VISIBILITYMAP_ALL_FROZEN))
> ...
> if (flags != vmstate)
> visibilitymap_set(...., flags);

and defining two macros for indivisual tests,

> #define VM_ALL_VISIBLE(r, b, v) ((vm_get_status((r), (b), (v)) & .._VISIBLE) != 0)
> if (VM_ALL_VISIBLE(rel, blkno, &vmbuffer))
and
> if (VM_ALL_FROZEN(rel, blkno, &vmbuffer))

How about this?

3. visibilitymap.c

- HEAPBLK_TO_MAPBIT

In visibilitymap_clear and other functions, mapBit means
mapDualBit in the patch, and mapBit always appears in the form
"mapBit * BITS_PER_HEAPBLOCK". So, it'd be better to change the
definition of HEAPBLK_TO_MAPBIT so that it calculates really
the bit position in a byte.

- #define HEAPBLK_TO_MAPBIT(x) ((x) % HEAPBLOCKS_PER_BYTE)
+ #define HEAPBLK_TO_MAPBIT(x) (((x) % HEAPBLOCKS_PER_BYTE) * BITS_PER_HEAPBLOCK)

- visibilitymap_count()

The third argument all_frozen is not necessary in some
usage. So this interface would be preferable to be as
following,

BlockNumber
visibilitymap_count(Relation rel, BlockNumber *all_frozen)
{
BlockNumber all_visible = 0;
...
if (all_frozen)
*all_frozen = 0;
... something like ...

- visibilitymap_set()

The check for ALL_VISIBLE is duplicate in the following
assertion.

> Assert((flags & VISIBILITYMAP_ALL_VISIBLE) ||
> (flags & (VISIBILITYMAP_ALL_VISIBLE | VISIBILITYMAP_ALL_FROZEN)));

4. documentation

- 18.11.1 Statement Hehavior

A typo.

> VACUUM performs *a* aggressive freezing

However I am not a fluent English speaker, and such
wordsmithing would be done by someone else, I feel that
"eager/greedy" is more suitable for this meaning..,
nevertheless, the term "whole-table freezing" that you wrote
elsewhere in this patch would be usable.

"VACUUM performs a whole-table freezing"

All "a table scan/sweep"s and something has the similar
meaning would be better be changed to "a whole-table
freezing"

In similar manner, "tuples/rows that are marked as frozen"
could be replaced with "unfrozen tuples/rows".

- 23.1.5 Preventing Transaction ID Wraparound Failures

"The whole table is scanned only when all pages happen to
require vacuuming to remove dead row versions."

This description looks a bit out-of-point. "the whole table
scan" in the original description is what is triggered by
relfrozenxid so the correspondent in the revised description
is "the whole-table freezing", maybe.

"The whole-table feezing takes place when
<structfield>relfrozenxid</> is more than
<varname>vacuum_freeze_table_age</> transactions old or when
<command>VACUUM</>'s <literal>FREEZE</> option is used. The
whole-table freezing scans all unfreezed pages."

The last sentence might be unnecessary.

- 63.4 Visibility Map

"pages contain only tuples that are marked as frozen" would be
enough to be "pages contain only frozen tuples"

and according to the discussion upthread, we might be good to
have some desciption that the name is historically omitting
the aspect of freezemap.

At Sat, 31 Oct 2015 18:07:32 +0530, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote in <CAA4eK1+aTdaSwG3u+y8fXxn67Kkj0T1KzRsFDLEi=tQvTYgFrQ(at)mail(dot)gmail(dot)com>
amit.kapila16> On Fri, Oct 30, 2015 at 6:03 AM, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
> Couple of more review comments:
> ------------------------------------------------------
>
> 1.
> @@ -615,6 +617,8 @@ typedef struct PgStat_StatTabEntry
> PgStat_Counter n_dead_tuples;
> PgStat_Counter
> changes_since_analyze;
>
> + int32 n_frozen_pages;
> +
> PgStat_Counter blocks_fetched;
> PgStat_Counter
> blocks_hit;
>
> As you are changing above structure, you need to update
> PGSTAT_FILE_FORMAT_ID, refer below code:
> #define PGSTAT_FILE_FORMAT_ID 0x01A5BC9D
>
> 2. It seems that n_frozen_page is not initialized/updated properly
> for toast tables:
>
> Try with below steps:
>
> postgres=# create table t4(c1 int, c2 text);
> CREATE TABLE
> postgres=# select oid, relname from pg_class where relname like '%t4%';
> oid | relname
> -------+---------
> 16390 | t4
> (1 row)
>
>
> postgres=# select oid, relname from pg_class where relname like '%16390%';
> oid | relname
> -------+----------------------
> 16393 | pg_toast_16390
> 16395 | pg_toast_16390_index
> (2 rows)
>
> postgres=# select relname,seq_scan,n_tup_ins,last_vacuum,n_frozen_page from
> pg_s
> tat_all_tables where relname like '%16390%';
> relname | seq_scan | n_tup_ins | last_vacuum | n_frozen_page
> ----------------+----------+-----------+-------------+---------------
> pg_toast_16390 | 1 | 0 | | -842150451
> (1 row)
>
> Note that I have tested above scenario on my Windows 7 m/c.
>
> 3.
> * visibilitymap.c
> * bitmap for tracking visibility of heap tuples
>
> I think above needs to be changed to:
> bitmap for tracking visibility and frozen state of heap tuples
>
>
> 4.
> a.
> /*
> - * If we froze any tuples, mark the buffer dirty, and write a WAL
> - * record recording the changes. We must log the changes to be
> - * crash-safe against future truncation of CLOG.
> + * If we froze any tuples then we mark the buffer dirty, and write a WAL
>
> b.
> - * Release any remaining pin on visibility map page.
> + * Release any remaining pin on visibility map.
>
> c.
> * We do update relallvisible even in the corner case, since if the table
> - * is all-visible
> we'd definitely like to know that. But clamp the value
> - * to be not more than what we're setting
> relpages to.
> + * is all-visible we'd definitely like to know that.
> + * But clamp the value to be not more
> than what we're setting relpages to.
>
> I don't think you need to change above comments.
>
> 5.
> + * Even if scan_all is set so far, we could skip to scan some pages
> + * according by all-frozen
> bit of visibility amp.
>
> /according by/according to
> /amp/map
>
> I suggested to modify comment as below:
> During full scan, we could skip some pages according to all-frozen
> bit of visibility map.
>
> Also no need to start this in new line, start from where the
> previous line of comment ends.
>
> 6.
> /*
> * lazy_scan_heap() -- scan an open heap relation
> *
> * This routine prunes each page in the
> heap, which will among other
> * things truncate dead tuples to dead line pointers, defragment the
> *
> page, and set commit status bits (see heap_page_prune). It also builds
> * lists of dead
> tuples and pages with free space, calculates statistics
> * on the number of live tuples in the
> heap, and marks pages as
> * all-visible if appropriate.
>
> Modify above function header as:
>
> all-visible, all-frozen
>
> 7.
> lazy_scan_heap()
> {
> ..
>
> if (PageIsEmpty(page))
> {
> empty_pages++;
> freespace =
> PageGetHeapFreeSpace(page);
>
> /* empty pages are always all-visible */
> if (!PageIsAllVisible(page))
> ..
> }
>
> Don't we need to ensure that empty pages should get marked as
> all-frozen?
>
> 8.
> lazy_scan_heap()
> {
> ..
> /*
> * As of PostgreSQL 9.2, the visibility map bit should never be set if
> * the page-
> level bit is clear. However, it's possible that the bit
> * got cleared after we checked it
> and before we took the buffer
> * content lock, so we must recheck before jumping to the conclusion
> * that something bad has happened.
> */
> else if (all_visible_according_to_vm
> && !PageIsAllVisible(page)
> && visibilitymap_test(onerel, blkno, &vmbuffer,
> VISIBILITYMAP_ALL_VISIBLE))
> {
> elog(WARNING, "page is not marked all-visible
> but visibility map bit is set in relation \"%s\" page %u",
> relname, blkno);
> visibilitymap_clear(onerel, blkno, vmbuffer);
> }
>
> /*
> *
> It's possible for the value returned by GetOldestXmin() to move
> * backwards, so it's not wrong for
> us to see tuples that appear to
> * not be visible to everyone yet, while PD_ALL_VISIBLE is already
> * set. The real safe xmin value never moves backwards, but
> * GetOldestXmin() is
> conservative and sometimes returns a value
> * that's unnecessarily small, so if we see that
> contradiction it just
> * means that the tuples that we think are not visible to everyone yet
> * actually are, and the PD_ALL_VISIBLE flag is correct.
> *
> * There should never
> be dead tuples on a page with PD_ALL_VISIBLE
> * set, however.
> */
> else
> if (PageIsAllVisible(page) && has_dead_tuples)
> {
> elog(WARNING, "page
> containing dead tuples is marked as all-visible in relation \"%s\" page %u",
>
> relname, blkno);
> PageClearAllVisible(page);
> MarkBufferDirty(buf);
> visibilitymap_clear(onerel, blkno, vmbuffer);
> }
>
> ..
> }
>
> I think both the above cases could happen for frozen state
> as well, unless you think otherwise, we need similar handling
> for frozen bit.

--
Kyotaro Horiguchi
NTT Open Source Software Center

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kyotaro HORIGUCHI 2015-11-05 09:10:31 psql completion for ids in multibyte string
Previous Message Torsten Zühlsdorff 2015-11-05 07:59:37 Re: September 2015 Commitfest