Re: checkpointer continuous flushing

From: Andres Freund <andres(at)anarazel(dot)de>
To: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
Cc: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, PostgreSQL Developers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: checkpointer continuous flushing
Date: 2015-08-10 19:28:14
Message-ID: 8A50D815-1536-48FA-ABD3-D2B50E49EF8D@anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On August 10, 2015 8:24:21 PM GMT+02:00, Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr> wrote:
>
>Hello Andres,
>
>> You can't allocate 4GB with palloc(), it has a builtin limit against
>> allocating more than 1GB.
>
>Argh, too bad, I assumed very naively that palloc was malloc in
>disguise.

It is, but there's some layering (memory pools/contexts) on top. You can get huge allocations with polloc_huge.

>Then the file would be fsynced twice: if the fsync is done properly
>(data
>have already been flushed to disk) then it would not cost much, and
>doing
>it sometimes twice on some file would not be a big issue. The code
>could
>also detect such event and log a warning, which would give a hint about
>
>how often it occurs in practice.

Right. At the cost of keeping track of all files...

>>>> If the pivot element changes its identity won't the result be
>pretty much
>>>> random?
>>>
>>> That would be a very unlikely event, given the short time spent in
>>> qsort.
>>
>> Meh, we don't want to rely on "likeliness" on such things.
>
>My main argument is that even if it occurs, and the qsort result is
>partly
>wrong, it does not change correctness, it just mean that the actual
>writes
>will be less in order than wished. If it occurs, one pivot separation
>would be quite strange, but then others would be right, so the buffers
>would be "partly sorted".

It doesn't matter for correctness today, correct. But it makes out impossible to rely on or too.

>Another issue I see is that even if buffers are locked within cmp, the
>status may change between two cmp...

Sure. That's not what in suggesting. Earlier versions of the patch kept an array of buffer headers exactly because of that.

I do not think that locking all
>buffers for sorting them is an option. So on the whole, I think that
>locking buffers for sorting is probably not possible with the simple
>(and
>efficient) lightweight approach used in the patch.

Yes, the other version has a higher space overhead. I'm not convinced that's meaningful in comparison to shared buffets in space.
And rather doubtful it a loss performance wise in a loaded server. All the buffer headers are touched on other cores and doing the sort with indirection will greatly increase bus traffic.

>The good news, as I argued before, is that the order is only advisory
>to
>help with performance, but the correctness is really that all
>checkpoint
>buffers are written and fsync is called in the end, and does not depend
>on
>the buffer order. That is how it currently works anyway

It's not particularly desirable to have a performance feature that works less well if the server is heavily and concurrently loaded. The likelihood of bogus sort results will increase with the churn rate in shared buffers.

Andres

---
Please excuse brevity and formatting - I am writing this on my mobile phone.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Stephen Frost 2015-08-10 19:40:59 Re: WIP: SCRAM authentication
Previous Message Josh Berkus 2015-08-10 19:21:55 Re: WIP: SCRAM authentication