Re: Reducing the size of BufferTag & remodeling forks

From: Andres Freund <andres(at)anarazel(dot)de>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Reducing the size of BufferTag & remodeling forks
Date: 2015-07-02 14:07:40
Message-ID: 20150702140740.GD16267@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2015-07-02 09:51:59 -0400, Tom Lane wrote:
> Andres Freund <andres(at)anarazel(dot)de> writes:
> > 1) Introduce a shared pg_relfilenode table. Every table, even
> > shared/nailed ones, get an entry therein. It's there to make it
> > possibly to uniquely allocate relfilenodes across databases &
> > tablespaces.
> > 2) Replace relation forks, with the exception of the init fork which is
> > special anyway, with separate relfilenodes. Stored in seperate
> > columns in pg_class.
>
> > Thoughts?
>
> I'm concerned about the traffic and contention involved with #1.

I don't think that'll be that significant in comparison to all the other
work done when creating a relation. Unless we do something wrong it'll
be highly unlikely to get row level contention, as the oids of the
individual relations will be from the oid counter or something similar.

> I'm also concerned about the assumption that relfilenode should,
> or even can be, unique across an entire installation. (I suppose
> widening it to 8 bytes would fix some of the hazards there, but
> that bloats your buffer tag again.)

Why? Because it limits the number of relations & forks we can have to
2**32? That seems like an extraordinary large limit? The catalog sizes
(pg_attribute most prominently) are a problem at a much lower number of
relations than that. Also rel/catcache management.

> But here's the big problem: you're talking about a huge amount of
> work for what seems likely to be a microscopic improvement in some
> operations.

I don't think it's microscopic at all. Just hacking away database &
tablespace from hashing & comparisons in the buffer tag (obviously not a
correct thing, but works enough for pgbench) results in quite measurable
performance benefits. But the main point isn't the performance
improvements themselves, but that it opens the door to smarter buffer
mapping algorithms, which e.g. will allow ordered access. Also not
having the current problem with increasing the number of forks would be
good.

> Worse, we'll be taking penalties for other operations.
> How will you do DropDatabaseBuffers() for instance?

> CREATE DATABASE is going to be a problem, too.

More promently than that, without access to the database/tablespace we
couldn't even write out dirty buffers in a reasonable manner. That's
why I think we're going to have to continue storing those two in the
buffer descriptors, just not include them in the buffer mapping.

Greetings,

Andres Freund

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Heikki Linnakangas 2015-07-02 14:07:47 Fwd: A better translation version of Chinese for psql/po/zh_CN.po file
Previous Message Pavel Stehule 2015-07-02 14:07:13 Re: raw output from copy