Quick Links

Re: checkpointer continuous flushing - V18

From:	Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
To:	Andres Freund <andres(at)anarazel(dot)de>
Cc:	PostgreSQL Developers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: checkpointer continuous flushing - V18
Date:	2016-03-07 20:10:19
Message-ID:	alpine.DEB.2.10.1603072043070.13457@sto
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Hello Andres,

>>>> (1) with 16 tablespaces (1 per table) on 1 disk : 680.0 tps
>>>> per second avg, stddev [ min q1 median d3 max ] <=300tps
>>>> 679.6 ± 750.4 [0.0, 317.0, 371.0, 438.5, 2724.0] 19.5%
>>>>
>>>> (2) with 1 tablespace on 1 disk : 956.0 tps
>>>> per second avg, stddev [ min q1 median d3 max ] <=300tps
>>>> 956.2 ± 796.5 [3.0, 488.0, 583.0, 742.0, 2774.0] 2.1%
>
> Well, that's not a particularly meaningful workload. You increased the
> number of flushed to the same number of disks considerably.

It is just a simple workload designed to emphasize the effect of having
one context shared for all table space instead of on per tablespace,
without rewriting the patch and without a large host with multiple disks.

> For a meaningful comparison you'd have to compare using one writeback
> context for N tablespaces on N separate disks/raids, and using N
> writeback contexts for the same.

Sure, it would be better to do that, but that would require (1) rewriting
the patch, which is a small work, and also (2) having access to a machine
with a number of disks/raids, that I do NOT have available.

What happens in the 16 tb workload is that much smaller flushes are
performed on the 16 files writen in parallel, so the tps performance is
significantly degraded, despite the writes being sorted in each file. On
one tb, all buffers flushed are in the same file, so flushes are much more
effective.

When the context is shared and checkpointer buffer writes are balanced
against table spaces, then when the limit is reached the flushing gets few
buffers per tablespace, so this limits sequential writes to few buffers,
hence the performance degradation.

So I can explain the performance degradation *because* the flush context
is shared between the table spaces, which is a logical argument backed
with experimental data, so it is better than handwaving. Given the
available hardware, this is the best proof I can have that context should
be per table space.

Now I cannot see how having one context per table space would have a
significant negative performance impact.

So the logical conclusion for me is that without further experimental data
it is better to have one context per table space.

If you have a hardware with plenty disks available for testing, that would
provide better data, obviously.

--
Fabien.

In response to

Re: checkpointer continuous flushing - V18 at 2016-03-07 18:05:58 from Andres Freund

Responses

Re: checkpointer continuous flushing - V18 at 2016-03-07 21:13:47 from Andres Freund

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Igal @ Lucee.org	2016-03-07 20:32:29	Proposal: RETURNING primary_key()
Previous Message	Robert Haas	2016-03-07 19:43:53	Re: ExecGather() + nworkers