Skip site navigation (1) Skip section navigation (2)

Re: limiting hint bit I/O

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Cédric Villemain <cedric(dot)villemain(dot)debian(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Merlin Moncure <mmoncure(at)gmail(dot)com>, Jim Nasby <jim(at)nasby(dot)net>, Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: limiting hint bit I/O
Date: 2011-02-05 18:33:08
Message-ID: (view raw, whole thread or download thread mbox)
Lists: pgsql-hackers
On Sat, Feb 5, 2011 at 10:37 AM, Cédric Villemain
<cedric(dot)villemain(dot)debian(at)gmail(dot)com> wrote:
> Please update the commitfest with the accurate patch, there is only
> the old immature v1 of the patch in it.
> I was about reviewing it...

Woops, sorry about that.  Here's an updated version, which I will also
add to the CommitFest application.

The need for this patch has been somewhat ameliorated by the fsync
queue compaction patch.  I tested with:

create table s as select g,
random()::text||random()::text||random()::text||random()::text from
generate_series(1,1000000) g;

The table was large enough not to fit in shared_buffers.  Then, repeatedly:

select sum(1) from s;

At the time I first posted this patch, running against git master, the
first run took about 1600 ms vs. ~207-216 ms for subsequent runs.  But
that was actually running up against the fsync queue problem.
Retesting today, the first run took 360 ms, and subsequent runs took
197-206 ms.  I doubt that the difference in the steady-state is
significant, since the tests were done on different days and not
controlled all that carefully, but clearly the response time spike for
the first scan is far lower than previously.  Setting the log level to
DEBUG1 revealed that the first scan did two fsync queue compactions.

The patch still does help to smooth things out, though.  Here are the
times for one series of selects, with the patch applied, after setting
up as described above:


Without the patch, as seen above, the first run is about ~80% slower.
With the patch applied, the first run is about 25% slower than the
steady state, and subsequent scans decline steadily from there.  Runs
21 and following flush no further data and run at full speed.  These
numbers aren't representative of all real-world scenarios, though.
On a system with many concurrent clients, CLOG contention might be an
issue; on the flip side, if this table were larger than RAM (not just
larger than shared_buffers) the decrease in write traffic as we scan
through the table might actually be a more significant benefit than it
is here, where it's mostly a question of kernel time; the I/O system
isn't actually taxed.  So I think this probably needs more testing
before we decide whether or not it's a good idea.

I adopted a few suggestions made previously in this version of the
patch.  Tom Lane recommended not messing with BM_JUST_DIRTY and
leaving that for another day.  I did that.  Also, per my previous
musings, I've adjusted this version so that vacuum behaves differently
when dirtying pages rather than when flushing them.  In versions 1 and
2, vacuum would always write pages that were dirty-only-for-hint-bits
when allocating a new buffer; in this version the buffer allocation
logic is the same for vacuum, but it marks pages dirty even when only
hint bits have changed.  The result is that VACUUM followed by
CHECKPOINT is enough to make sure all hint bits are set on disk, just
as is the case today.

Robert Haas
The Enterprise PostgreSQL Company

Attachment: bm-hint-bits-v3.patch
Description: text/x-diff (14.3 KB)

In response to


pgsql-hackers by date

Next:From: Robert HaasDate: 2011-02-05 18:40:08
Subject: Re: We need to log aborted autovacuums
Previous:From: Bruce MomjianDate: 2011-02-05 18:10:12
Subject: Re: is_absolute_path incorrect on Windows

Privacy Policy | About PostgreSQL
Copyright © 1996-2017 The PostgreSQL Global Development Group