Re: limiting hint bit I/O

From: Cédric Villemain <cedric(dot)villemain(dot)debian(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Merlin Moncure <mmoncure(at)gmail(dot)com>, Jim Nasby <jim(at)nasby(dot)net>, Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: limiting hint bit I/O
Date: 2011-02-05 20:07:30
Message-ID: AANLkTimbi6BLjVbDp5NMkfEEGb7AYBMYfoh9NBFipzj1@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

2011/2/5 Robert Haas <robertmhaas(at)gmail(dot)com>:
> On Sat, Feb 5, 2011 at 10:37 AM, Cédric Villemain
> <cedric(dot)villemain(dot)debian(at)gmail(dot)com> wrote:
>> Please update the commitfest with the accurate patch, there is only
>> the old immature v1 of the patch in it.
>> I was about reviewing it...
>>
>> https://commitfest.postgresql.org/action/patch_view?id=500
>
> Woops, sorry about that.  Here's an updated version, which I will also
> add to the CommitFest application.
>
> The need for this patch has been somewhat ameliorated by the fsync
> queue compaction patch.  I tested with:
>
> create table s as select g,
> random()::text||random()::text||random()::text||random()::text from
> generate_series(1,1000000) g;
> checkpoint;
>
> The table was large enough not to fit in shared_buffers.  Then, repeatedly:
>
> select sum(1) from s;
>
> At the time I first posted this patch, running against git master, the
> first run took about 1600 ms vs. ~207-216 ms for subsequent runs.  But
> that was actually running up against the fsync queue problem.
> Retesting today, the first run took 360 ms, and subsequent runs took
> 197-206 ms.  I doubt that the difference in the steady-state is
> significant, since the tests were done on different days and not
> controlled all that carefully, but clearly the response time spike for
> the first scan is far lower than previously.  Setting the log level to
> DEBUG1 revealed that the first scan did two fsync queue compactions.
>
> The patch still does help to smooth things out, though.  Here are the
> times for one series of selects, with the patch applied, after setting
> up as described above:
>
> 257.108
> 259.245
> 249.181
> 245.896
> 250.161
> 241.559
> 240.538
> 241.091
> 232.727
> 232.779
> 232.543
> 226.265
> 225.029
> 222.015
> 217.106
> 216.426
> 217.724
> 210.604
> 209.630
> 203.507
> 197.521
> 204.448
> 196.809
>
> Without the patch, as seen above, the first run is about ~80% slower.
> With the patch applied, the first run is about 25% slower than the
> steady state, and subsequent scans decline steadily from there.  Runs
> 21 and following flush no further data and run at full speed.  These
> numbers aren't representative of all real-world scenarios, though.
> On a system with many concurrent clients, CLOG contention might be an
> issue; on the flip side, if this table were larger than RAM (not just
> larger than shared_buffers) the decrease in write traffic as we scan
> through the table might actually be a more significant benefit than it
> is here, where it's mostly a question of kernel time; the I/O system
> isn't actually taxed.  So I think this probably needs more testing
> before we decide whether or not it's a good idea.

I *may* have an opportunity to test that in a real world application
where this hint bit was an issue.

>
> I adopted a few suggestions made previously in this version of the
> patch.  Tom Lane recommended not messing with BM_JUST_DIRTY and
> leaving that for another day.

yes, good.

> I did that.  Also, per my previous
> musings, I've adjusted this version so that vacuum behaves differently
> when dirtying pages rather than when flushing them.  In versions 1 and
> 2, vacuum would always write pages that were dirty-only-for-hint-bits
> when allocating a new buffer; in this version the buffer allocation
> logic is the same for vacuum, but it marks pages dirty even when only
> hint bits have changed.  The result is that VACUUM followed by
> CHECKPOINT is enough to make sure all hint bits are set on disk, just
> as is the case today.

for now it looks better to reduce this impact, yes..
Keeping the logic from v1 or v2 imply vacuum freeze to 'fix' the hint
bit, right ?

--
Cédric Villemain               2ndQuadrant
http://2ndQuadrant.fr/     PostgreSQL : Expertise, Formation et Support

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2011-02-05 20:18:43 Re: limiting hint bit I/O
Previous Message Robert Haas 2011-02-05 18:40:08 Re: We need to log aborted autovacuums