Re: Freeze avoidance of very large table.

From: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To: robertmhaas(at)gmail(dot)com
Cc: sawada(dot)mshk(at)gmail(dot)com, bruce(at)momjian(dot)us, alvherre(at)2ndquadrant(dot)com, Jim(dot)Nasby(at)bluetreble(dot)com, michael(dot)paquier(at)gmail(dot)com, jeff(dot)janes(at)gmail(dot)com, amit(dot)kapila16(at)gmail(dot)com, andres(at)anarazel(dot)de, simon(at)2ndquadrant(dot)com, josh(at)agliodbs(dot)com, masao(dot)fujii(at)gmail(dot)com, petr(at)2ndquadrant(dot)com, stark(at)mit(dot)edu, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Freeze avoidance of very large table.
Date: 2016-03-02 08:33:25
Message-ID: 20160302.173325.185017873.horiguchi.kyotaro@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Thank you for revising and commiting this.

At Tue, 1 Mar 2016 21:51:55 -0500, Robert Haas <robertmhaas(at)gmail(dot)com> wrote in <CA+TgmoZtG7hnkgP74zRCeuRrGGG917J5-_P4dzNJz5_kAXFTKg(at)mail(dot)gmail(dot)com>
> On Thu, Feb 18, 2016 at 3:45 AM, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> > Attached updated 5 patches.
> > I would like to explain these patch shortly again here to make
> > reviewing more easier.
> >
> > We can divided these patches into 2 purposes.
> >
> > 1. Freeze map
> > 000_ patch adds additional frozen bit into visibility map, but doesn't
> > include the logic for improve freezing performance.
> > 001_ patch gets rid of page-conversion code from pg_upgrade. (This
> > patch doesn't related to this feature essentially, but is required by
> > 002_ patch.)
> > 002_ patch adds upgrading mechanism from 9.6- to 9.6+ and its regression test.
> >
> > 2. Improve freezing logic
> > 003_ patch changes the VACUUM to optimize scans based on freeze map
> > (i.g., 000_ patch), and its regression test.
> > 004_ patch enhances debug messages in src/backend/access/heap/visibilitymap.c
> >
> > Please review them.
>
> I have pushed 000 and part of 003, with substantial revisions to the
> 003 part and minor revisions to the 000 part. This gets the basic
> infrastructure in place, but the vacuum optimization and pg_upgrade
> fixes still need to be done.
>
> I discovered that make check-world failed with 000 applied, because
> the Assert()s added to visibilitymap_set were using | rather than & to
> test for a set bit. I fixed that.

It looks reasonable as far as I can see. Thank you for your
labor for the additional part.

> I revised the code in vacuumlazy.c that updates the new map bits
> rather heavily. I hope I didn't break anything; please have a look
> and see if you spot any problems. One big problem was that it's
> inadequate to judge whether a tuple needs freezing just by looking at
> xmin; xmax might need to be cleared, for example.

The new function heap_tuple_needs_eventual_freeze looks
reasonable for me in comparizon with heap_tuple_needs_freeze.

Looking the additional diff for lazy_vacuum_page, I noticed that
visibilitymap_set have a potential performance problem. (Though
it doesn't seem to occur for now.)

visibilitymap_set decides to modify vm bits by the following
code.

| if (flags = (map[mapByte] >> mapBit & VISIBILITYMAP_VALID_BITS))
| {
| START_CRIT_SECTION();
|
| map[mapByte] |= (flags << mapBit);

This is effectively right and no problem but it runs the critical
section for the case of (vmbit = 11, flags = 01), which does not
need to do so. Please apply this if this looks reasonable.

======
diff --git a/src/backend/access/heap/visibilitymap.c b/src/backend/access/heap/visibilitymap.c
index 2e64fc3..87b7fc6 100644
--- a/src/backend/access/heap/visibilitymap.c
+++ b/src/backend/access/heap/visibilitymap.c
@@ -292,7 +292,8 @@ visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
map = (uint8 *)PageGetContents(page);
LockBuffer(vmBuf, BUFFER_LOCK_EXCLUSIVE);

- if (flags != (map[mapByte] >> mapBit & VISIBILITYMAP_VALID_BITS))
+ /* modify vm bits only if any bit is necessary to be set */
+ if (~flags & (map[mapByte] >> mapBit & VISIBILITYMAP_VALID_BITS))
{
START_CRIT_SECTION();
======

> I removed the pgstat stuff. I'm not sure we want that stuff in that
> form; it doesn't seem to fit with the rest of what's in that view, and
> it wasn't reliable in my testing. I did however throw together a
> little contrib module for testing, which I attach here. I'm not sure
> we want to commit this, and at the least someone would need to write
> documentation. But it's certainly handy for checking whether this
> works.

I hanven't considered about the reliability but the
n_frozen_pages in the proposed patch surelly seems alien to the
columns around it.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Valery Popov 2016-03-02 08:43:23 Re: [REVIEW]: Password identifiers, protocol aging and SCRAM protocol
Previous Message Tom Lane 2016-03-02 07:51:48 Re: TAP / recovery-test fs-level backups, psql enhancements etc