Re: checkpointer continuous flushing

From: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, PostgreSQL Developers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: checkpointer continuous flushing
Date: 2015-09-09 19:29:12
Message-ID: alpine.DEB.2.10.1509092056380.21932@sto
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


Hello Andres,

>>> Wouldn't it be just as easy to put this logic into the checkpointing
>>> code?
>>
>> Not sure it would simplify anything, because the checkpointer currently
>> knows about buffers but flushing is about files, which are hidden from
>> view.
>
> It'd not really simplify things, but it'd keep it local.

Ok, it would be local, but it would also mean that the checkpointer would
have to deal explicitely with files, whereas it currently does not have
to.

I think that the current buffer/file boundary is on engineering principle
a good one, so I tried to break it as little as possible to enable the
feature, and I wanted to avoid to have to do a buffer to file translation
twice, once in the checkpointer and once when writing the buffer.

>>> * Wouldn't a binary heap over the tablespaces + progress be nicer?
>>
>> I'm not sure where it would fit exactly.
>
> Imagine a binaryheap.h style heap over a structure like (tablespaceid,
> progress, progress_inc, nextbuf) where the comparator compares the progress.

It would replace what is currently an array. The balancing code needs to
enumerate all tablespaces in a round-robin way so as to ensure that all
tablespaces are given some attention, otherwise you could have a balance
on two tablespaces but others could be left out. The array makes this
property straightforward.

>> Anyway, I think it would complicate the code significantly (compared to the
>> straightforward array)
>
> I doubt it. I mean instead of your GetNext you'd just do:
> next_tblspc = DatumGetPointer(binaryheap_first(heap));
> if (next_tblspc == 0)
> return 0;
> next_tblspc.progress += next_tblspc.progress_slice;
> binaryheap_replace_first(PointerGetDatum(next_tblspc));
>
> return next_tblspc.nextbuf++;

Compare to the array, this tree approach would required ln(Ntablespace)
work to extract and reinsert the tablespace under progress, so there is no
complexity advantage.

Moreover, given that in most cases there are 1 or 2 tablespaces, a tree
structure is really on the heavy side.

> progress_slice is the number of buffers in the tablespace divided by the
> number of total buffers, to avoid doing any sort of expensive math in
> the more frequently executed path.

If there are many buffers, I'm not too sure about rounding issues and the
like, so the current approach with a rational seems more secure.

> [...] I'm not seing where you'd need an extra pointer?

Indeed, I misunderstood.

> [...] What for would you need to bsearch?

To find the tablespace boundaries in the sorted buffer array in
log(NBuffers) * Ntablespace, instead of NBuffers.

>> I think that the current approach is ok as the number of tablespace
>> should be small.
>
> Right that's often the case.

Yep.

>> ISTM that using a tablespace in the sorting would reduce the complexity to
>> ln(NBuffers) * Ntablespace for finding the boundaries, and then Nbuffers *
>> (Ntablespace/Ntablespace) = NBuffers for scanning, at the expense of more
>> memory and code complexity.
>
> Afaics finding the boundaries can be done as part of the enumeration of
> tablespaces in BufferSync(). That code needs to be moved, but that's not
> too bad. I don't see the code be that much more complicated?

Hmmm. you are proposing to replace prooved and heavilly tested code by a
more complex tree data structures distributed quite differently around the
source, and no very clear benefit.

So I would prefer to keep the code as is, that is pretty straightforward,
and wait for a strong incentive before doing anything fancier. ISTM that
there are other places in pg need attention more that further tweaking
this patch.

--
Fabien.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2015-09-09 19:36:25 Re: checkpointer continuous flushing
Previous Message Andres Freund 2015-09-09 19:12:32 Re: checkpointer continuous flushing