Re: checkpointer continuous flushing

From: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, PostgreSQL Developers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: checkpointer continuous flushing
Date: 2016-01-07 21:28:02
Message-ID: alpine.DEB.2.10.1601072154420.24114@sto
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


Hello Andres,

>> Hmmm. What I understood is that the workloads that have some performance
>> regressions (regressions that I have *not* seen in the many tests I ran) are
>> not due to checkpointer IOs, but rather in settings where most of the writes
>> is done by backends or bgwriter.
>
> As far as I can see you've not run many tests where the hot/warm data
> set is larger than memory (the full machine's memory, not
> shared_buffers).

Indeed, I think I ran some, but not many with such characteristics.

> That quite drastically alters the performance characteristics here,
> because you suddenly have lots of synchronous read IO thrown into the
> mix.

If I understand this point correctly...

I would expect the overall performance to be abysmal in such a situation
because you get only intermixed *random* read and writes: As you point
out, synchroneous *random* reads (very slow), but on the write side the
IOs are mostly random as well on the checkpointer side because there is
not much to aggregate to get sequential writes.

Now why would that degrade performance significantly? For me it should
render the sorting/flushing less and less effective, and it would go back
to the previous performance levels...

Or maybe it only the flushing itself which degrades performance, as you
point out, because then you have some synchronous (synced) writes as well
as read, as opposed to just the reads before without the patch.

If this is indeed the issue, then the solution to avoid the regression is
*not* to flush so that the OS IO scheduler is less constrained in its job,
and can be slightly more effective (well, we talking of abysmal random IO
disk performance here, so effective would be between slightly more or less
very very very bad).

Maybe a trick could be not to aggregate and flush when buffers in the same
file are too much apart anyway, for instance, based on some threshold?
This can be implemented locally when deciding to merge buffer flushes or
not, and whether to flush or not, so it would fit the current code quite
simply.

Now my understanding of the sync_file_range call is that it is an advice
to flush the stuff, but it is still asynchronous in nature, so whether it
would impact performance that badly depends on the OS IO scheduler. Also,
I would like to check whether, under the "regressed performance" (in tps
term that you observed), pg is more or less responsive. It could be that
the average performance is better but pg is offline longer on fsync. In
which case, I would consider it better to have lower tps in such cases
*if* pg responsiveness is significantly improved.

Would you have these measures for the regression runs you observed?

> Whether it's bgwriter or not I've not fully been able to establish, but
> it's a working theory.

Ok, that is something to check for confirmation or infirmation.

Given the above discussion, I think my suggestion may be wrong: as the tps
is low because of random read/write accesses then not many buffers are
modified (so the bgwriter/backends won't need to make space), the
checkpointer does not have much to write (good), *but* all of it is random
(bad).

>> I do not see the point of rewriting the checkpointer for them, although
>> obviously I agree that something has to be done also for the other
>> processes.
>
> Rewriting the checkpointer and fixing the flush interface in a more
> generic way aren't the same thing at all.

Hmmm, probably I misunderstood something in the discussion. It started
with an implementation strategy, but it derived to discussing a
performance regression. I aggree that these are two different subjects.

--
Fabien.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2016-01-07 21:50:09 Re: Re: [COMMITTERS] pgsql: Windows: Make pg_ctl reliably detect service status
Previous Message David Rowley 2016-01-07 21:12:37 Re: WIP: Covering + unique indexes.