Re: unlogged tables v5

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: unlogged tables v5
Date: 2010-12-27 11:01:15
Message-ID: AANLkTik+QeHF+qYjCXw-z0Y_rKY1qaTnGqN13ZJfcM4=@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Dec 27, 2010 at 3:22 AM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
> Not sure from reading the docs whether unlogged indexes are supported on
> logged tables? Could you clarify (or clarify more often)? Does this
> solve the hash index situation?

They are not. The only place you'll see that the grammar has been
modified is in the CREATE TABLE documentation. The table is either
unlogged or not, and if it is unlogged, then its indices are also
unlogged. As I mentioned on one of the previous threads, I think
unlogged indexes on logged tables would be a useful thing to support,
but it turns out to require somewhat different infrastructure from
what I've built here, and I don't have any imminent plans to work on
it.

> Unlogged tables aren't replicated, but they would be copied as part of a
> base backup. I'd request that we have a different forkname, or some
> other indicator that allows a base backup to allow exclusion of such
> files, since they are going to get reset to zero very soon afterwards
> anyway. Not everyone would wish it, but its a good option.

You can identify them by looking for files that have an _init fork,
and then skipping the corresponding main-fork files. I initially had
the idea of actually giving the main fork a distinguishing name, but
this turns out to have two serious defects: (1) it makes the patch a
lot more invasive, to the point where if I'd had to keep going that
route I likely would have given up on the feature altogether, and (2)
you still need the init fork anyway. I agree this is a bit of a drag
to work with for base backups but I don't think there's a reasonable
way to do better.

>> There are a couple of open issues which I'm thinking can be left for
>> future work.
>>
>> A. Minimization of fsync traffic.  fsyncs on unlogged relations can
>> potentially be postponed until shutdown time.  Right now, they'll
>> happen as part of the next checkpoint.
>
> Half the fun of unlogged tables was for me the ability to skip the fsync
> and the checkpoint writes. If we're using PostgreSQL as a cache, it will
> be a little hard to explain why it still does writes in a huge storm
> every so often. A performance feature that doesn't avoid a performance
> hit seems a little strange, to say the least.
>
> Should be easy enough to mark a flag on each buffer, only examined at
> checkpoint.

Well, there is such a flag, but it just skips the checkpoint write.
If the cleaning scan or a backend does a write, an fsync is still
scheduled (and gets performed at the next checkpoint). The problem
here is that there was a groundswell of support for making unlogged
tables survive a clean shutdown, and you've got to eventually perform
these fsyncs in order for that to work. Now you could postpone them
until the shutdown checkpoint, but that's only going to help in a
limited set of cases. Specifically:

1. If the unlogged tables fit entirely within shared_buffers, it won't help.
2. If the operating system writes back the dirty buffers before the
next checkpoint, it won't help.
3. If you're running ext3, then the first fsync on anything is going
to flush the entire buffer cache for that FS out to disk anyway, so
unless pg_xlog is on a separate partition AND you have absolutely no
writes to any normal tables since the last checkpoint, it won't help.

We could probably get a test system set up where we run XFS on Linux
with an unlogged table larger than shared buffers and demonstrate
that, in fact, there is some preventable suckiness there.

But even then, I'm not convinced I should try to fix that as part of
this patch. First, of course, there is a huge performance gain from
using unlogged tables even without this optimization. I have tested
it; it's big. Second, we already know that the background writer's
algorithm for scheduling fsyncs is not optimal. It may be that if we
end up rejiggering that algorithm, something will fall out to handle
this case, too. I'd rather not contort the logic too much to handle
this case until we have a better idea of how we want it to work in
general.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Magnus Hagander 2010-12-27 11:15:11 Re: Streaming replication as a separate permissions
Previous Message Magnus Hagander 2010-12-27 11:00:30 Re: Streaming replication as a separate permissions