Re: unlogged tables

From: Andres Freund <andres(at)anarazel(dot)de>
To: pgsql-hackers(at)postgresql(dot)org
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Kenneth Marshall <ktm(at)rice(dot)edu>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, marcin mank <marcin(dot)mank(at)gmail(dot)com>, Andy Colson <andy(at)squeakycode(dot)net>
Subject: Re: unlogged tables
Date: 2010-11-17 20:35:05
Message-ID: 201011172135.06272.andres@anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wednesday 17 November 2010 20:54:14 Robert Haas wrote:
> On Wed, Nov 17, 2010 at 2:31 PM, Andrew Dunstan <andrew(at)dunslane(dot)net> wrote:
> >>> The customer is always right, and I think we are hearing loud and clear
> >>> what the customers want. Please let's not go out of our way to create
> >>> a feature that isn't what they want.
> >>
> >> I would be fine with only having a safe shutdown with unlogged tables
> >> and skip the checkpoint I/O all other times.
> >
> > Yeah, I was just thinking something like that would be good, and should
> > overcome Robert's objection to the whole idea.
>
> Could we slow down here a bit and talk through the ideas here in a
> logical fashion?
>
> The customer is always right, but the informed customer makes better
> decisions than the uninformed customer. This idea, as proposed, does
> not work. If you only include dirty buffers at the final checkpoint
> before shutting down, you have no guarantee that any buffers that you
> either didn't write or didn't fsync previously are actually on disk.
> Therefore, you have no guarantee that the table data is not corrupted.
> So you really have to decide between including the unlogged-table
> buffers in EVERY checkpoint and not ever including them at all. Which
> one is right depends on your use case.
How can you get a buffer which was no written out *at all*? Do you want to
force all such pages to stay in shared_buffers? That sounds quite a bit more
complicated than what you proposed...

> For example, consider the poster who said that, when this feature is
> available, they plan to try ripping out their memcached instance and
> replacing it with PostgreSQL running unlogged tables. Suppose this
> poster (or someone else in a similar situation) has a 64 GB and is
> currently running a 60 GB memcached instance on it, which is not an
> unrealistic scenario for memcached. Suppose further that he dirties
> 25% of that data each hour. memcached is currently doing no writes to
> disk. When he switches to PostgreSQL and sets checkpoints_segments to
> a gazillion and checkpoint_timeout to the maximum, he's going to start
> writing 15 GB of data to disk every hour - data which he clearly
> doesn't care about losing, or preserving across restarts, because he's
> currently storing it in memcached. In fact, with memcached, he'll not
> only lose data at shutdown - he'll lose data on a regular basis when
> everything is running normally. We can try to convince ourselves that
> someone in this situation will not care about needing to get 15GB of
> disposable data per hour from memory to disk in order to have a
> feature that he doesn't need, but I think it's going to be pretty hard
> to make that credible.
To really support that use case we would first need to make shared_buffers
properly scale to 64GB - which unfortunatley, in my experience, is not yet the
case.
Also, see the issues in the former paragraph - I have severe doubts you can
support such a memcached scenario by pg. Either you spill to disk if your
buffers overflow (fine with me) or you need to throw away data memcached alike. I
doubt there is a sensible implementation in pg for the latter.

So you will have to write to disk at some point...

> Third use case. Someone on pgsql-general mentioned that they want to
> write logs to PG, and can abide losing them if a crash happens, but
> not on a clean shutdown and restart. This person clearly shuts down
> their production database a lot more often than I do, but that is OK.
> By explicit stipulation, they want the survive-a-clean-shutdown
> behavior. I have no problem supporting that use case, providing they
> are willing to take the associated performance penalty at checkpoint
> time, which we don't know because we haven't asked, but I'm fine with
> assuming it's useful even though I probably wouldn't use it much
> myself.
Maybe I am missing something - but why does this imply we have to write data
at checkpoints?
Just fsyncing every file belonging to an persistently-unlogged (or whatever
sensible name anyone can come up) table is not prohibively expensive - in fact
doing that on a local $PGDATA with approx 300GB and loads of tables doing so
takes less than 15s on a system with hot inode/dentry cache and no dirty files.
(just `find $PGDATA -print0|xargs -0 fsync_many_files` with fsync_many_files
beeing a tiny c program doing posix_fadvise(POSIX_FADV_DONTNEED) on all files
and then fsyncs every one).
The assumption of a hot inode cache is realistic I think.

Andres

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2010-11-17 20:37:24 Re: unlogged tables
Previous Message Steve Singer 2010-11-17 20:24:25 Re: Review: rollback sequence reset for TRUNCATE ... RESTART IDENTITY