Re: checkpointer continuous flushing

From: Andres Freund <andres(at)anarazel(dot)de>
To: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, PostgreSQL Developers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: checkpointer continuous flushing
Date: 2015-09-06 21:39:58
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 2015-09-06 16:05:01 +0200, Fabien COELHO wrote:
> >Wouldn't it be just as easy to put this logic into the checkpointing code?
> Not sure it would simplify anything, because the checkpointer currently
> knows about buffers but flushing is about files, which are hidden from
> view.

It'd not really simplify things, but it'd keep it local.

> >* Wouldn't a binary heap over the tablespaces + progress be nicer?
> I'm not sure where it would fit exactly.

Imagine a binaryheap.h style heap over a structure like (tablespaceid,
progress, progress_inc, nextbuf) where the comparator compares the progress.

> Anyway, I think it would complicate the code significantly (compared to the
> straightforward array)

I doubt it. I mean instead of your GetNext you'd just do:
next_tblspc = DatumGetPointer(binaryheap_first(heap));
if (next_tblspc == 0)
return 0;
next_tblspc.progress += next_tblspc.progress_slice;

return next_tblspc.nextbuf++;

progress_slice is the number of buffers in the tablespace divided by the
number of total buffers, to avoid doing any sort of expensive math in
the more frequently executed path.

> Moreover such a data structure would probably require some kind of pointer
> (probably 8 bytes added per node, maybe more), and the amount of memory is
> already a concern, at least to me, and moreover it has to reside in shared
> memory which does not simplify allocation of tree data structures.

I'm not seing where you'd need an extra pointer? Maybe the
misunderstanding is that I'm proposing to do a heap over the
*tablespaces* not the actual buffers.

> >If you make the sorting criterion include the tablespace id you wouldn't
> >need the lookahead loop in NextBufferToWrite().
> Yep, I thought of it. It would mean 4 more bytes per buffer, and bsearch to
> find the boundaries, so significantly less simple code.

What for would you need to bsearch?

> I think that the current approach is ok as the number of tablespace
> should be small.

Right that's often the case.

> >Isn't the current approach O(NBuffers^2) in the worst case?
> ISTM that the overall lookahead complexity is Nbuffers * Ntablespace:
> buffers are scanned once for each tablespace.

Which in the worst case is NBuffers * 2...

> ISTM that using a tablespace in the sorting would reduce the complexity to
> ln(NBuffers) * Ntablespace for finding the boundaries, and then Nbuffers *
> (Ntablespace/Ntablespace) = NBuffers for scanning, at the expense of more
> memory and code complexity.

Afaics finding the boundaries can be done as part of the enumeration of
tablespaces in BufferSync(). That code needs to be moved, but that's not
too bad. I don't see the code be that much more complicated?


Andres Freund

In response to


Browse pgsql-hackers by date

  From Date Subject
Next Message Jeff Janes 2015-09-06 22:47:17 Re: Too many duplicated condition query return wrong value
Previous Message 2015-09-06 21:18:02 Re: [PATCH] Refactoring of LWLock tranches