Re: checkpointer continuous flushing

From: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL Developers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: checkpointer continuous flushing
Date: 2015-06-25 04:53:59
Message-ID: alpine.DEB.2.10.1506250632170.3535@sto
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


Hello Amit,

>> [...]
>> Ok, I misunderstood your question. I thought you meant a dip between 1
>> client and 4 clients. I meant that when flush is turned on tps goes down by
>> 8% (743 to 681 tps) on this particular run.
>
> This 8% might matter if the dip is bigger with more clients and
> more aggressive workload. Do you know what could lead to this
> dip, because if we know what is the reason than it will be more
> predictable to know if this is the max dip that could happen or it
> could lead to bigger dip in other cases.

I do not know the cause of the dip, and whether it would increase with
more clients. I do not have a box for such tests. If someone can provided
the box, I can provide test scripts:-)

The first, although higher, measure is really very unstable, with pg
totaly unresponsive (offline, really) at time.

I think that the flush option may always have a risk of (small)
detrimental effects on tps, because there are two steady states: one with
pg only doing wal-logged transactions with great tps, and one with pg
doing the checkpoint at nought tps. If this is on the same disk, even at
best the combination means that probably each operation will amper the
other one a little bit, so the combined tps performance would/could be
lower than doing one after the other and having pg offline 50% of the
time...

Please also note that this 8% "dip" is on a 681 (with the dip) vs 198 (no
options at all) a X 3.4 improvement compared to pg current behavior.

>> Basically tps improvements mostly come from "sort", and "flush" has
>> uncertain effects on tps (throuput), but much more on latency and
>> performance stability (lower late rate, lower standard deviation).
>
> I agree that performance stability is important, but not sure if it
> is good idea to sacrifice the throuput for it.

See discussion above. I think better stability may imply slightly lower
throughput on some load. That is why there are options and DBA to choose
them:-)

> If sort + flush always gives better results, then isn't it better to
> perform these actions together under one option.

Sure, but that is not currently the case. Also what is done is very
orthogonal, so I would tend to keep these separate. If one is always
beneficial and it is wished that it should be always activated, then the
option could be removed.

>> Hmmm. My point of view is still that the logical priority is to optimize
>> for disk IO first, then look for compatible RAM optimisations later.
>
> It is not only about RAM optimisation which we can do later, but also
> about avoiding regression in existing use-cases.

Hmmm. Currently I have not seen really significant regressions. I have
seen some less good impact of some options on some loads.

--
Fabien.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Etsuro Fujita 2015-06-25 06:11:32 Re: Foreign join pushdown vs EvalPlanQual
Previous Message Michael Paquier 2015-06-25 04:40:34 Re: Supporting TAP tests with MSVC and Windows