Re: Freeze avoidance of very large table.

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Jim Nasby <Jim(dot)Nasby(at)bluetreble(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Petr Jelinek <petr(at)2ndquadrant(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Greg S <stark(at)mit(dot)edu>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Subject: Re: Freeze avoidance of very large table.
Date: 2015-11-12 23:18:43
Message-ID: CAD21AoCTJ4n9qm09S5yucvoDDHHcOJporTFXWbj4ax=NxJ5EPQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Nov 5, 2015 at 6:03 PM, Kyotaro HORIGUCHI
<horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
> Hello, I had a look on v21 patch.
>
> Though I haven't looked the whole of the patch, I'd like to show
> you some comments only for visibilitymap.c and a part of the
> documentation.
>
>
> 1. Patch application
>
> patch command complains about offsets for heapam.c at current
> master.
>
> 2. visitibilymap_test()
>
> - if (visibilitymap_test(rel, blkno, &vmbuffer))
> + if (visibilitymap_test(rel, blkno, &vmbuffer, VISIBILITYMAP_ALL_VISIBLE)
>
> The old VM was a simple bitmap so the name _test and the
> function are proper but now the bitmap is quad state so it'd be
> better chainging the function. Alghough it is not so expensive
> to call it twice successively, it is a bit uneasy for me doing
> so. One possible shape would be like the following.
>
> lazy_vacuum_page()
> > int vmstate = visibilitymap_get_status(rel, blkno, &vmbuffer);
> > if (!(vmstate & VISIBILITYMAP_ALL_VISIBLE))
> > ...
> > if (all_frozen && !(vmstate & VISIBILITYMAP_ALL_FROZEN))
> > ...
> > if (flags != vmstate)
> > visibilitymap_set(...., flags);
>
> and defining two macros for indivisual tests,
>
> > #define VM_ALL_VISIBLE(r, b, v) ((vm_get_status((r), (b), (v)) & .._VISIBLE) != 0)
> > if (VM_ALL_VISIBLE(rel, blkno, &vmbuffer))
> and
> > if (VM_ALL_FROZEN(rel, blkno, &vmbuffer))
>
> How about this?
>
>
> 3. visibilitymap.c
>
> - HEAPBLK_TO_MAPBIT
>
> In visibilitymap_clear and other functions, mapBit means
> mapDualBit in the patch, and mapBit always appears in the form
> "mapBit * BITS_PER_HEAPBLOCK". So, it'd be better to change the
> definition of HEAPBLK_TO_MAPBIT so that it calculates really
> the bit position in a byte.
>
> - #define HEAPBLK_TO_MAPBIT(x) ((x) % HEAPBLOCKS_PER_BYTE)
> + #define HEAPBLK_TO_MAPBIT(x) (((x) % HEAPBLOCKS_PER_BYTE) * BITS_PER_HEAPBLOCK)
>
>
> - visibilitymap_count()
>
> The third argument all_frozen is not necessary in some
> usage. So this interface would be preferable to be as
> following,
>
> BlockNumber
> visibilitymap_count(Relation rel, BlockNumber *all_frozen)
> {
> BlockNumber all_visible = 0;
> ...
> if (all_frozen)
> *all_frozen = 0;
> ... something like ...
>
> - visibilitymap_set()
>
> The check for ALL_VISIBLE is duplicate in the following
> assertion.
>
> > Assert((flags & VISIBILITYMAP_ALL_VISIBLE) ||
> > (flags & (VISIBILITYMAP_ALL_VISIBLE | VISIBILITYMAP_ALL_FROZEN)));
>
>
>
> 4. documentation
>
> - 18.11.1 Statement Hehavior
>
> A typo.
>
> > VACUUM performs *a* aggressive freezing
>
> However I am not a fluent English speaker, and such
> wordsmithing would be done by someone else, I feel that
> "eager/greedy" is more suitable for this meaning..,
> nevertheless, the term "whole-table freezing" that you wrote
> elsewhere in this patch would be usable.
>
> "VACUUM performs a whole-table freezing"
>
> All "a table scan/sweep"s and something has the similar
> meaning would be better be changed to "a whole-table
> freezing"
>
> In similar manner, "tuples/rows that are marked as frozen"
> could be replaced with "unfrozen tuples/rows".
>
> - 23.1.5 Preventing Transaction ID Wraparound Failures
>
> "The whole table is scanned only when all pages happen to
> require vacuuming to remove dead row versions."
>
> This description looks a bit out-of-point. "the whole table
> scan" in the original description is what is triggered by
> relfrozenxid so the correspondent in the revised description
> is "the whole-table freezing", maybe.
>
> "The whole-table feezing takes place when
> <structfield>relfrozenxid</> is more than
> <varname>vacuum_freeze_table_age</> transactions old or when
> <command>VACUUM</>'s <literal>FREEZE</> option is used. The
> whole-table freezing scans all unfreezed pages."
>
> The last sentence might be unnecessary.
>
>
> - 63.4 Visibility Map
>
> "pages contain only tuples that are marked as frozen" would be
> enough to be "pages contain only frozen tuples"
>
> and according to the discussion upthread, we might be good to
> have some desciption that the name is historically omitting
> the aspect of freezemap.
>
>
> At Sat, 31 Oct 2015 18:07:32 +0530, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote in <CAA4eK1+aTdaSwG3u+y8fXxn67Kkj0T1KzRsFDLEi=tQvTYgFrQ(at)mail(dot)gmail(dot)com>
> amit.kapila16> On Fri, Oct 30, 2015 at 6:03 AM, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
>> Couple of more review comments:
>> ------------------------------------------------------
>>
>> 1.
>> @@ -615,6 +617,8 @@ typedef struct PgStat_StatTabEntry
>> PgStat_Counter n_dead_tuples;
>> PgStat_Counter
>> changes_since_analyze;
>>
>> + int32 n_frozen_pages;
>> +
>> PgStat_Counter blocks_fetched;
>> PgStat_Counter
>> blocks_hit;
>>
>> As you are changing above structure, you need to update
>> PGSTAT_FILE_FORMAT_ID, refer below code:
>> #define PGSTAT_FILE_FORMAT_ID 0x01A5BC9D
>>
>> 2. It seems that n_frozen_page is not initialized/updated properly
>> for toast tables:
>>
>> Try with below steps:
>>
>> postgres=# create table t4(c1 int, c2 text);
>> CREATE TABLE
>> postgres=# select oid, relname from pg_class where relname like '%t4%';
>> oid | relname
>> -------+---------
>> 16390 | t4
>> (1 row)
>>
>>
>> postgres=# select oid, relname from pg_class where relname like '%16390%';
>> oid | relname
>> -------+----------------------
>> 16393 | pg_toast_16390
>> 16395 | pg_toast_16390_index
>> (2 rows)
>>
>> postgres=# select relname,seq_scan,n_tup_ins,last_vacuum,n_frozen_page from
>> pg_s
>> tat_all_tables where relname like '%16390%';
>> relname | seq_scan | n_tup_ins | last_vacuum | n_frozen_page
>> ----------------+----------+-----------+-------------+---------------
>> pg_toast_16390 | 1 | 0 | | -842150451
>> (1 row)
>>
>> Note that I have tested above scenario on my Windows 7 m/c.
>>
>> 3.
>> * visibilitymap.c
>> * bitmap for tracking visibility of heap tuples
>>
>> I think above needs to be changed to:
>> bitmap for tracking visibility and frozen state of heap tuples
>>
>>
>> 4.
>> a.
>> /*
>> - * If we froze any tuples, mark the buffer dirty, and write a WAL
>> - * record recording the changes. We must log the changes to be
>> - * crash-safe against future truncation of CLOG.
>> + * If we froze any tuples then we mark the buffer dirty, and write a WAL
>>
>> b.
>> - * Release any remaining pin on visibility map page.
>> + * Release any remaining pin on visibility map.
>>
>> c.
>> * We do update relallvisible even in the corner case, since if the table
>> - * is all-visible
>> we'd definitely like to know that. But clamp the value
>> - * to be not more than what we're setting
>> relpages to.
>> + * is all-visible we'd definitely like to know that.
>> + * But clamp the value to be not more
>> than what we're setting relpages to.
>>
>> I don't think you need to change above comments.
>>
>> 5.
>> + * Even if scan_all is set so far, we could skip to scan some pages
>> + * according by all-frozen
>> bit of visibility amp.
>>
>> /according by/according to
>> /amp/map
>>
>> I suggested to modify comment as below:
>> During full scan, we could skip some pages according to all-frozen
>> bit of visibility map.
>>
>> Also no need to start this in new line, start from where the
>> previous line of comment ends.
>>
>> 6.
>> /*
>> * lazy_scan_heap() -- scan an open heap relation
>> *
>> * This routine prunes each page in the
>> heap, which will among other
>> * things truncate dead tuples to dead line pointers, defragment the
>> *
>> page, and set commit status bits (see heap_page_prune). It also builds
>> * lists of dead
>> tuples and pages with free space, calculates statistics
>> * on the number of live tuples in the
>> heap, and marks pages as
>> * all-visible if appropriate.
>>
>> Modify above function header as:
>>
>> all-visible, all-frozen
>>
>> 7.
>> lazy_scan_heap()
>> {
>> ..
>>
>> if (PageIsEmpty(page))
>> {
>> empty_pages++;
>> freespace =
>> PageGetHeapFreeSpace(page);
>>
>> /* empty pages are always all-visible */
>> if (!PageIsAllVisible(page))
>> ..
>> }
>>
>> Don't we need to ensure that empty pages should get marked as
>> all-frozen?
>>
>> 8.
>> lazy_scan_heap()
>> {
>> ..
>> /*
>> * As of PostgreSQL 9.2, the visibility map bit should never be set if
>> * the page-
>> level bit is clear. However, it's possible that the bit
>> * got cleared after we checked it
>> and before we took the buffer
>> * content lock, so we must recheck before jumping to the conclusion
>> * that something bad has happened.
>> */
>> else if (all_visible_according_to_vm
>> && !PageIsAllVisible(page)
>> && visibilitymap_test(onerel, blkno, &vmbuffer,
>> VISIBILITYMAP_ALL_VISIBLE))
>> {
>> elog(WARNING, "page is not marked all-visible
>> but visibility map bit is set in relation \"%s\" page %u",
>> relname, blkno);
>> visibilitymap_clear(onerel, blkno, vmbuffer);
>> }
>>
>> /*
>> *
>> It's possible for the value returned by GetOldestXmin() to move
>> * backwards, so it's not wrong for
>> us to see tuples that appear to
>> * not be visible to everyone yet, while PD_ALL_VISIBLE is already
>> * set. The real safe xmin value never moves backwards, but
>> * GetOldestXmin() is
>> conservative and sometimes returns a value
>> * that's unnecessarily small, so if we see that
>> contradiction it just
>> * means that the tuples that we think are not visible to everyone yet
>> * actually are, and the PD_ALL_VISIBLE flag is correct.
>> *
>> * There should never
>> be dead tuples on a page with PD_ALL_VISIBLE
>> * set, however.
>> */
>> else
>> if (PageIsAllVisible(page) && has_dead_tuples)
>> {
>> elog(WARNING, "page
>> containing dead tuples is marked as all-visible in relation \"%s\" page %u",
>>
>> relname, blkno);
>> PageClearAllVisible(page);
>> MarkBufferDirty(buf);
>> visibilitymap_clear(onerel, blkno, vmbuffer);
>> }
>>
>> ..
>> }
>>
>> I think both the above cases could happen for frozen state
>> as well, unless you think otherwise, we need similar handling
>> for frozen bit.
>

Thank you for reviewing the patch.

I changed the patch so that the visibility map become the page info
map, in source code and documentation.
And fixed review comments I received.
Attached v22 patch.

> I think both the above cases could happen for frozen state
> as well, unless you think otherwise, we need similar handling
> for frozen bit.

It's not happen the situation where is all-frozen and not all-visible,
and the bits of visibility map are cleared at the same time, page
flags are as well.
So I think it's enough to handle only all-visible situation. Am I
missing something?

> 2. visitibilymap_test()
>
> - if (visibilitymap_test(rel, blkno, &vmbuffer))
> + if (visibilitymap_test(rel, blkno, &vmbuffer, VISIBILITYMAP_ALL_VISIBLE)
>
> The old VM was a simple bitmap so the name _test and the
> function are proper but now the bitmap is quad state so it'd be
> better chainging the function. Alghough it is not so expensive
> to call it twice successively, it is a bit uneasy for me doing
> so. One possible shape would be like the following.
>
> lazy_vacuum_page()
> > int vmstate = visibilitymap_get_status(rel, blkno, &vmbuffer);
> > if (!(vmstate & VISIBILITYMAP_ALL_VISIBLE))
> > ...
> > if (all_frozen && !(vmstate & VISIBILITYMAP_ALL_FROZEN))
> > ...
> > if (flags != vmstate)
> > visibilitymap_set(...., flags);
>
> and defining two macros for indivisual tests,
>
> > #define VM_ALL_VISIBLE(r, b, v) ((vm_get_status((r), (b), (v)) & .._VISIBLE) != 0)
> > if (VM_ALL_VISIBLE(rel, blkno, &vmbuffer))
> and
> > if (VM_ALL_FROZEN(rel, blkno, &vmbuffer))
>
> How about this?

I agree.
I've changed so.

Regards,

--
Masahiko Sawada

Attachment Content-Type Size
000_page_info_map_v22.patch text/x-patch 169.9 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Masahiko Sawada 2015-11-13 00:07:01 Re: Support for N synchronous standby servers - take 2
Previous Message Alvaro Herrera 2015-11-12 21:58:41 Re: BUG #13741: vacuumdb does not accept valid password