Re: Freeze avoidance of very large table.

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Jim Nasby <Jim(dot)Nasby(at)bluetreble(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Petr Jelinek <petr(at)2ndquadrant(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Greg Stark <stark(at)mit(dot)edu>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Subject: Re: Freeze avoidance of very large table.
Date: 2015-10-31 12:37:32
Message-ID: CAA4eK1+aTdaSwG3u+y8fXxn67Kkj0T1KzRsFDLEi=tQvTYgFrQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Oct 30, 2015 at 6:03 AM, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
wrote:
>
>
>
> v20 patch has a bug in result of regression test.
> Attached updated v21 patch.
>

Couple of more review comments:
------------------------------------------------------

1.
@@ -615,6 +617,8 @@ typedef struct PgStat_StatTabEntry
PgStat_Counter n_dead_tuples;
PgStat_Counter
changes_since_analyze;

+ int32 n_frozen_pages;
+
PgStat_Counter blocks_fetched;
PgStat_Counter
blocks_hit;

As you are changing above structure, you need to update
PGSTAT_FILE_FORMAT_ID, refer below code:
#define PGSTAT_FILE_FORMAT_ID 0x01A5BC9D

2. It seems that n_frozen_page is not initialized/updated properly
for toast tables:

Try with below steps:

postgres=# create table t4(c1 int, c2 text);
CREATE TABLE
postgres=# select oid, relname from pg_class where relname like '%t4%';
oid | relname
-------+---------
16390 | t4
(1 row)

postgres=# select oid, relname from pg_class where relname like '%16390%';
oid | relname
-------+----------------------
16393 | pg_toast_16390
16395 | pg_toast_16390_index
(2 rows)

postgres=# select relname,seq_scan,n_tup_ins,last_vacuum,n_frozen_page from
pg_s
tat_all_tables where relname like '%16390%';
relname | seq_scan | n_tup_ins | last_vacuum | n_frozen_page
----------------+----------+-----------+-------------+---------------
pg_toast_16390 | 1 | 0 | | -842150451
(1 row)

Note that I have tested above scenario on my Windows 7 m/c.

3.
* visibilitymap.c
* bitmap for tracking visibility of heap tuples

I think above needs to be changed to:
bitmap for tracking visibility and frozen state of heap tuples

4.
a.
/*
- * If we froze any tuples, mark the buffer dirty, and write a WAL
- * record recording the changes. We must log the changes to be
- * crash-safe against future truncation of CLOG.
+ * If we froze any tuples then we mark the buffer dirty, and write a WAL

b.
- * Release any remaining pin on visibility map page.
+ * Release any remaining pin on visibility map.

c.
* We do update relallvisible even in the corner case, since if the table
- * is all-visible
we'd definitely like to know that. But clamp the value
- * to be not more than what we're setting
relpages to.
+ * is all-visible we'd definitely like to know that.
+ * But clamp the value to be not more
than what we're setting relpages to.

I don't think you need to change above comments.

5.
+ * Even if scan_all is set so far, we could skip to scan some pages
+ * according by all-frozen
bit of visibility amp.

/according by/according to
/amp/map

I suggested to modify comment as below:
During full scan, we could skip some pages according to all-frozen
bit of visibility map.

Also no need to start this in new line, start from where the
previous line of comment ends.

6.
/*
* lazy_scan_heap() -- scan an open heap relation
*
* This routine prunes each page in the
heap, which will among other
* things truncate dead tuples to dead line pointers, defragment the
*
page, and set commit status bits (see heap_page_prune). It also builds
* lists of dead
tuples and pages with free space, calculates statistics
* on the number of live tuples in the
heap, and marks pages as
* all-visible if appropriate.

Modify above function header as:

all-visible, all-frozen

7.
lazy_scan_heap()
{
..

if (PageIsEmpty(page))
{
empty_pages++;
freespace =
PageGetHeapFreeSpace(page);

/* empty pages are always all-visible */
if (!PageIsAllVisible(page))
..
}

Don't we need to ensure that empty pages should get marked as
all-frozen?

8.
lazy_scan_heap()
{
..
/*
* As of PostgreSQL 9.2, the visibility map bit should never be set if
* the page-
level bit is clear. However, it's possible that the bit
* got cleared after we checked it
and before we took the buffer
* content lock, so we must recheck before jumping to the conclusion
* that something bad has happened.
*/
else if (all_visible_according_to_vm
&& !PageIsAllVisible(page)
&& visibilitymap_test(onerel, blkno, &vmbuffer,
VISIBILITYMAP_ALL_VISIBLE))
{
elog(WARNING, "page is not marked all-visible
but visibility map bit is set in relation \"%s\" page %u",
relname, blkno);
visibilitymap_clear(onerel, blkno, vmbuffer);
}

/*
*
It's possible for the value returned by GetOldestXmin() to move
* backwards, so it's not wrong for
us to see tuples that appear to
* not be visible to everyone yet, while PD_ALL_VISIBLE is already
* set. The real safe xmin value never moves backwards, but
* GetOldestXmin() is
conservative and sometimes returns a value
* that's unnecessarily small, so if we see that
contradiction it just
* means that the tuples that we think are not visible to everyone yet
* actually are, and the PD_ALL_VISIBLE flag is correct.
*
* There should never
be dead tuples on a page with PD_ALL_VISIBLE
* set, however.
*/
else
if (PageIsAllVisible(page) && has_dead_tuples)
{
elog(WARNING, "page
containing dead tuples is marked as all-visible in relation \"%s\" page %u",

relname, blkno);
PageClearAllVisible(page);
MarkBufferDirty(buf);
visibilitymap_clear(onerel, blkno, vmbuffer);
}

..
}

I think both the above cases could happen for frozen state
as well, unless you think otherwise, we need similar handling
for frozen bit.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Fetter 2015-10-31 14:42:45 Re: ALTER ... OWNER TO ... vs. ALTER DEFAULT PRIVILEGES
Previous Message konstantin knizhnik 2015-10-31 09:22:07 eXtensible Transaction Manager API