Re: checkpointer continuous flushing

From: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, PostgreSQL Developers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: checkpointer continuous flushing
Date: 2015-08-23 07:03:47
Message-ID: alpine.DEB.2.10.1508230812470.29146@sto
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


Hello Amit,

> I have tried your scripts and found some problem while using avg.py
> script.
> grep 'progress:' test_medium4_FW_off.out | cut -d' ' -f4 | ./avg.py
> --limit=10 --length=300
> : No such file or directory

> I didn't get chance to poke into avg.py script (the command without
> avg.py works fine). Python version on the m/c, I planned to test is
> Python 2.7.5.

Strange... What does "/usr/bin/env python" say? Can the script be started
on its own at all? I think that the script should work both with python2
and python3, at least it does on my laptop...

> Today while reading the first patch (checkpoint-continuous-flush-10-a),
> I have given some thought to below part of patch which I would like
> to share with you.
>
> + * Select a tablespace depending on the current overall progress.
> + *
> + * The progress ratio of each unfinished tablespace is compared to
> + * the overall progress ratio to find one with is not in advance
> + * (i.e. overall ratio > tablespace ratio,
> + * i.e. tablespace written/to_write > overall written/to_write

> Here, I think above calculation can go for toss if backend or bgwriter
> starts writing buffers when checkpoint is in progress. The tablespace
> written parameter won't be able to consider the one's written by backends
> or bgwriter.

Sure... This is *already* the case with the current checkpointer, the
schedule is performed with respect to the initial number of buffers it
think it will have to write, and if someone else writes these buffers then
the schedule is skewed a little bit, or more... I have not changed this
logic, but I extended it to handle several tablespaces.

If this (the checkpointer progress evaluation used for its schedule is
sometimes wrong because of other writes) is proven to be a major
performance issue, then the processes which writes the checkpointed
buffers behind its back should tell the checkpointer about it, probably
with some shared data structure, so that the checkpointer can adapt its
schedule.

This is an independent issue, that may be worth to address some day. My
opinion is that when the bgwriter or backends quick in to write buffers,
they are basically generating random I/Os on HDD and killing tps and
latency, so it is a very bad time anyway, thus I'm not sure that this is
the next problem to address to improve pg performance and responsiveness.

> Now it may not big thing to worry but I find Heikki's version worth
> considering, he has not changed the overall idea of this patch, but the
> calculations are somewhat simpler and hence less chance of going wrong.

I do not think that Heikki version worked wrt to balancing writes over
tablespaces, and I'm not sure it worked at all. However I reused some of
his ideas to simplify and improve the code.

--
Fabien.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Fabien COELHO 2015-08-23 07:28:14 Re: PATCH: numeric timestamp in log_line_prefix
Previous Message Tom Lane 2015-08-23 04:06:18 Re: PostgreSQL for VAX on NetBSD/OpenBSD