Re: checkpointer continuous flushing

From: Andres Freund <andres(at)anarazel(dot)de>
To: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
Cc: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, PostgreSQL Developers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: checkpointer continuous flushing
Date: 2015-08-17 11:41:38
Message-ID: 20150817114138.GG3522@awork2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2015-08-11 17:15:22 +0200, Fabien COELHO wrote:
> +void
> +PerformFileFlush(FileFlushContext * context)
> +{
> + if (context->ncalls != 0)
> + {
> + int rc;
> +
> +#if defined(HAVE_SYNC_FILE_RANGE)
> +
> + /* Linux: tell the memory manager to move these blocks to io so
> + * that they are considered for being actually written to disk.
> + */
> + rc = sync_file_range(context->fd, context->offset, context->nbytes,
> + SYNC_FILE_RANGE_WRITE);
> +
> +#elif defined(HAVE_POSIX_FADVISE)
> +
> + /* Others: say that data should not be kept in memory...
> + * This is not exactly what we want to say, because we want to write
> + * the data for durability but we may need it later nevertheless.
> + * It seems that Linux would free the memory *if* the data has
> + * already been written do disk, else the "dontneed" call is ignored.
> + * For FreeBSD this may have the desired effect of moving the
> + * data to the io layer, although the system does not seem to
> + * take into account the provided offset & size, so it is rather
> + * rough...
> + */
> + rc = posix_fadvise(context->fd, context->offset, context->nbytes,
> + POSIX_FADV_DONTNEED);
> +
> +#endif
> +
> + if (rc < 0)
> + ereport(ERROR,
> + (errcode_for_file_access(),
> + errmsg("could not flush block " INT64_FORMAT
> + " on " INT64_FORMAT " blocks in file \"%s\": %m",
> + context->offset / BLCKSZ,
> + context->nbytes / BLCKSZ,
> + context->filename)));
> + }

I'm a bit wary that this might cause significant regressions on
platforms not supporting sync_file_range, but support posix_fadvise()
for workloads that are bigger than shared_buffers. Consider what happens
if the workload does *not* fit into shared_buffers but *does* fit into
the OS's buffer cache. Suddenly reads will go to disk again, no?

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2015-08-17 12:01:38 Re: Warnings around booleans
Previous Message Andres Freund 2015-08-17 10:59:12 Re: checkpointer continuous flushing