Re: checkpointer continuous flushing

From: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc: PostgreSQL Developers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: checkpointer continuous flushing
Date: 2016-03-18 08:07:38
Message-ID: alpine.DEB.2.10.1603180856300.31871@sto
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


Hello Tomas,

> But I do think it's a very useful tool when it comes to measuring the
> consistency of behavior over time, assuming you're asking questions
> about the intervals and not the original transactions.

For a throttled run, I think it is better to check whether or not the
system could handle the load "as expected", i.e. with reasonnable latency,
so somehow I'm interested in the "original transactions" as scheduled by
the client, and whether they were processed efficiently, but then it must
be aggregated by interval to get some statistics.

> For example, had there been intervals with vastly different transaction
> rates, we'd see that on the tps charts (i.e. the chart would be much more
> gradual or wobbly, just like the "unpatched" one). Or if there were intervals
> with much higher variance of latencies, we'd see that on the STDDEV chart.

On HDDs what happens is that transactions are "blocked/freezed", the tps
is very low, the latency very high, but then with few tx (even 1 or 0 at
time) and all latencies very bad but nevertheless close one to the other,
in a bad way, the resulting stddev may be quite small anyway.

> I'll consider repeating the benchmark and logging some reasonable sample of
> transactions

Beware that this measure is skewed, because on HDDs when the system is
stuck, it is stuck on very few transactions which are waiting, but they
would seldom show on statistics are there are very few of them. That is
why I'm interested in those that could not make it, hence my interest in
--latency-limit option which just say that.

>>> So I don't think this would make any measurable difference in practice.
>>
>> I think that it may show that 25% of the time the system could not
>> match the target tps, even if it can handle much more on average, so
>> the tps achieved when discarding late transactions would be under
>> 4000 tps.
>
> You mean the 'throttled-tps' chart?

Yes.

> Yes, that one shows that without the patches, there's a lot of intervals
> where the tps was much lower - presumably due to a lot of slow
> transactions.

Yep. That is what is measured with the latency limit option, by counting
the dropped transactions that where not processed in a timely maner.

--
Fabien.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message John Snow 2016-03-18 08:42:47 oldest xmin is far in the past
Previous Message Andres Freund 2016-03-18 08:04:28 Re: Performance degradation in commit ac1d794