Re: Rewriting Free Space Map

From: "Heikki Linnakangas" <heikki(at)enterprisedb(dot)com>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "PostgreSQL-development" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Rewriting Free Space Map
Date: 2008-03-17 19:26:26
Message-ID: 47DEC5E2.6090103@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Tom Lane wrote:
> "Heikki Linnakangas" <heikki(at)enterprisedb(dot)com> writes:
>> Tom Lane wrote:
>>> You're cavalierly waving away a whole boatload of problems that will
>>> arise as soon as you start trying to make the index AMs play along
>>> with this :-(.
>
>> It doesn't seem very hard.
>
> The problem is that the index AMs are no longer in control of what goes
> where within their indexes, which has always been their prerogative to
> determine. The fact that you think you can kluge btree to still work
> doesn't mean that it will work for other AMs.

Well, it does work with all the existing AMs AFAICS. I do agree with the
general point; it'd certainly be cleaner, more modular and more flexible
if the AMs didn't need to know about the existence of the maps.

>>> The idea that's becoming attractive to me while contemplating the
>>> multiple-maps problem is that we should adopt something similar to
>>> the old Mac OS idea of multiple "forks" in a relation.
>
>> Hmm. You also need to teach at least xlog.c and xlogutils.c about the
>> map forks, for full page images and the invalid page tracking.
>
> Well, you'd have to teach them something anyway, for any incarnation
> of maps that they might need to update.

Umm, the WAL code doesn't care where the pages it operates on came from.
Sure, we'll need rmgr-specific code that know what to do with the maps,
but the full page image code would work without changes with the
multiple RelFileNode approach.

The essential change with the map fork idea is that a RelFileNode no
longer uniquely identifies a file on disk (ignoring the segmentation
which is handled in smgr for now). Anything that operates on
RelFileNodes, without any higher level information of what it is, needs
to be modified to use RelFileNode+forkid instead. That includes at least
the buffer manager, smgr, and the full page image code in xlog.c.

It's probably a pretty mechanical change, even though it affects a lot
of code. We'd probably want to have a new struct, let's call it
PhysFileId for now, for RelFileNode+forkid, and basically replace all
occurrences of RelFileNode with PhysFileId in smgr, bufmgr and xlog code.

>> I also wonder what the performance impact of extending BufferTag is.
>
> That's a fair objection, and obviously something we'd need to check.
> But I don't recall seeing hash_any so high on any profile that I think
> it'd be a big problem.

I do remember seeing hash_any in some oprofile runs. But that's fairly
easy to test: we don't need to actually implement any of the stuff,
other than add a field to BufferTag, and run pgbench.

>> My original thought was to have a separate RelFileNode for each of the
>> maps. That would require no smgr or xlog changes, and not very many
>> changes in the buffer manager, though I guess you'd more catalog
>> changes. You had doubts about that on the previous thread
>> (http://archives.postgresql.org/pgsql-hackers/2007-11/msg00204.php), but
>> the "map forks" idea certainly seems much more invasive than that.
>
> The main problems with that are (a) the need to expose every type of map
> in pg_class and (b) the need to pass all those relfilenode numbers down
> to pretty low levels of the system.

(a) is certainly a valid point. Regarding (b), I don't think the low
level stuff (I assume you mean smgr, bufmgr, bgwriter, xlog by that)
would need to be passed any additional relfilenode numbers. Or rather,
they already work with relfilenodes, and they don't need to know whether
the relfilenode is for an index, a heap, or an FSM attached to something
else. The relfilenodes would be in RelationData, and we already have
that around whenever we do anything that needs to differentiate between
those.

Another consideration is which approach is easiest to debug. The "map
fork" approach seems better on that front, as you can immediately see
from the PhysFileId if a page is coming from an auxiliary map or the
main data portion. That might turn out to be handy in the buffer manager
or bgwriter as well; they don't currently have any knowledge of what a
page contains.

> The nice thing about the fork idea
> is that you don't need any added info to uniquely identify what relation
> you're working on. The fork numbers would be hard-wired into whatever
> code needed to know about particular forks. (Of course, these same
> advantages apply to using special space in an existing file. I'm
> just suggesting that we can keep these advantages without buying into
> the restrictions that special space would have.)

I don't see that advantage. All the higher-level code that care which
relation you're working on already have Relation around. All the
lower-level stuff don't care.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Dawid Kuroczko 2008-03-17 19:34:42 Re: Rewriting Free Space Map
Previous Message Gregory Stark 2008-03-17 19:02:04 Re: Rewriting Free Space Map