Re: limiting hint bit I/O

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Cédric Villemain <cedric(dot)villemain(dot)debian(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Merlin Moncure <mmoncure(at)gmail(dot)com>, Jim Nasby <jim(at)nasby(dot)net>, Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: limiting hint bit I/O
Date: 2011-02-05 18:33:08
Message-ID: AANLkTimGKaG7wdu-x77GNV2Gh6_Qo5Ss1u5b6Q1MsPUy@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Feb 5, 2011 at 10:37 AM, Cédric Villemain
<cedric(dot)villemain(dot)debian(at)gmail(dot)com> wrote:
> Please update the commitfest with the accurate patch, there is only
> the old immature v1 of the patch in it.
> I was about reviewing it...
>
> https://commitfest.postgresql.org/action/patch_view?id=500

Woops, sorry about that. Here's an updated version, which I will also
add to the CommitFest application.

The need for this patch has been somewhat ameliorated by the fsync
queue compaction patch. I tested with:

create table s as select g,
random()::text||random()::text||random()::text||random()::text from
generate_series(1,1000000) g;
checkpoint;

The table was large enough not to fit in shared_buffers. Then, repeatedly:

select sum(1) from s;

At the time I first posted this patch, running against git master, the
first run took about 1600 ms vs. ~207-216 ms for subsequent runs. But
that was actually running up against the fsync queue problem.
Retesting today, the first run took 360 ms, and subsequent runs took
197-206 ms. I doubt that the difference in the steady-state is
significant, since the tests were done on different days and not
controlled all that carefully, but clearly the response time spike for
the first scan is far lower than previously. Setting the log level to
DEBUG1 revealed that the first scan did two fsync queue compactions.

The patch still does help to smooth things out, though. Here are the
times for one series of selects, with the patch applied, after setting
up as described above:

257.108
259.245
249.181
245.896
250.161
241.559
240.538
241.091
232.727
232.779
232.543
226.265
225.029
222.015
217.106
216.426
217.724
210.604
209.630
203.507
197.521
204.448
196.809

Without the patch, as seen above, the first run is about ~80% slower.
With the patch applied, the first run is about 25% slower than the
steady state, and subsequent scans decline steadily from there. Runs
21 and following flush no further data and run at full speed. These
numbers aren't representative of all real-world scenarios, though.
On a system with many concurrent clients, CLOG contention might be an
issue; on the flip side, if this table were larger than RAM (not just
larger than shared_buffers) the decrease in write traffic as we scan
through the table might actually be a more significant benefit than it
is here, where it's mostly a question of kernel time; the I/O system
isn't actually taxed. So I think this probably needs more testing
before we decide whether or not it's a good idea.

I adopted a few suggestions made previously in this version of the
patch. Tom Lane recommended not messing with BM_JUST_DIRTY and
leaving that for another day. I did that. Also, per my previous
musings, I've adjusted this version so that vacuum behaves differently
when dirtying pages rather than when flushing them. In versions 1 and
2, vacuum would always write pages that were dirty-only-for-hint-bits
when allocating a new buffer; in this version the buffer allocation
logic is the same for vacuum, but it marks pages dirty even when only
hint bits have changed. The result is that VACUUM followed by
CHECKPOINT is enough to make sure all hint bits are set on disk, just
as is the case today.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment Content-Type Size
bm-hint-bits-v3.patch text/x-diff 14.3 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2011-02-05 18:40:08 Re: We need to log aborted autovacuums
Previous Message Bruce Momjian 2011-02-05 18:10:12 Re: is_absolute_path incorrect on Windows