Re: checkpointer continuous flushing

From: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
To: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Cc: Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL Developers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: checkpointer continuous flushing
Date: 2015-08-09 07:42:06
Message-ID: alpine.DEB.2.10.1508090837300.6511@sto
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hello Heikki,

Thanks for having a look at the patch.

> * I think we should drop the "flush" part of this for now. It's not as
> clearly beneficial as the sorting part, and adds a great deal more code
> complexity. And it's orthogonal to the sorting patch, so we can deal with it
> separately.

I agree that it is orthogonal and that the two features could be in
distinct patches. The flush part is the first patch I really submitted
because it has significant effect on latency, and I was told to mix it
with sorting...

The flushing part really helps to keep "write stalls" under control in
many cases, for instance:

- 400-tps 1-client (or 4 for medium) max 100-ms latency

options | percent of late transactions
flush | sort | tiny | small | medium
off | off | 12.0 | 64.28 | 68.6
off | on | 11.3 | 22.05 | 22.6
on | off | 1.1 | 67.93 | 67.9
on | on | 0.6 | 3.24 | 3.1

The "percent of late transactions" is really the fraction of time the
database is unreachable because of write stalls... So sort without flush
is cleary not enough.

Another thing suggested by Andres is to fsync as early as possible, but
this is not a simple patch because is intermix things which are currently
in distinct parts of checkpoint processing, so I already decided that this
would be for another submission.

> * Is it really necessary to parallelize the I/O among tablespaces? I can see
> the point, but I wonder if it makes any difference in practice.

I think that if someone bothers with tablespace there is no reason to kill
them behind her. Without sorting you may hope that tablespaces will be
touched randomly enough, but once buffers are sorted you can probably find
cases where it would write on one table space and then on the other.

So I think that it really should be kept.

> * Is there ever any harm in sorting the buffers? The GUC is useful for
> benchmarking, but could we leave it out of the final patch?

I think that the performance show that it is basically always beneficial,
so the guc may be left out. However on SSD it is unclear to me whether it
is just a loss of time or whether it helps, say with wear-leveling. Maybe
best to keep it? Anyway it is definitely needed for testing.

> * Do we need to worry about exceeding the 1 GB allocation limit in
> AllocateCheckpointBufferIds? It's enough got 2 TB of shared_buffers. That's a
> lot, but it's not totally crazy these days that someone might do that. At the
> very least, we need to lower the maximum of shared_buffers so that you can't
> hit that limit.


> I ripped out the "flushing" part, keeping only the sorting. I refactored
> the logic in BufferSync() a bit. There's now a separate function,
> nextCheckpointBuffer(), that returns the next buffer ID from the sorted
> list. The tablespace-parallelization behaviour in encapsulated there,

I do not understand the new tablespace-parallelization logic: there is no
test about the tablespace of the buffer in the selection process... Note
that I did wrote a proof for the one I put, and also did some detailed
testing on the side because I'm always wary of proofs, especially mines:-)

I notice that you assume that table space numbers are always small and
contiguous. Is that a fact? I was feeling more at ease with relying on a
hash table to avoid such an assumption.

> keeping the code in BufferSync() much simpler. See attached. Needs some
> minor cleanup and commenting still before committing, and I haven't done
> any testing besides a simple "make check".

Hmmm..., just another detail, the patch does not sort:

+ if (checkpoint_sort && num_to_write > 1 && false)

I'll resubmit a patch with only the sorting part, and do the kind of
restructuring you suggest which is a good thing.


In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Vladimir Koković 2015-08-09 08:00:11 make check-world problem
Previous Message Satoshi Nagayasu 2015-08-09 05:36:23 Re: Assert in pg_stat_statements