Re: Plans for solving the VACUUM problem

From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Plans for solving the VACUUM problem
Date: 2001-05-18 02:27:51
Message-ID: 200105180227.f4I2Rpa13258@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> Free space map details
> ----------------------
>
> I envision the FSM as a shared hash table keyed by table ID, with each
> entry containing a list of page numbers and free space in each such page.
>
> The FSM is empty at system startup and is filled by lazy VACUUM as it
> processes each table. Backends then decrement/remove page entries as they
> use free space.
>
> Critical point: the FSM is only a hint and does not have to be perfectly
> accurate. It can omit space that's actually available without harm, and
> if it claims there's more space available on a page than there actually
> is, we haven't lost much except a wasted ReadBuffer cycle. This allows
> us to take shortcuts in maintaining it. In particular, we can constrain
> the FSM to a prespecified size, which is critical for keeping it in shared
> memory. We just discard entries (pages or whole relations) as necessary
> to keep it under budget. Obviously, we'd not bother to make entries in
> the first place for pages with only a little free space. Relation entries
> might be discarded on a least-recently-used basis.

The only question I have is about the Free Space Map. It would seem
better to me if we could get this map closer to the table itself, rather
than having every table of every database mixed into the same shared
memory area. I can just see random table access clearing out most of
the map cache and perhaps making it less useless.

It would be nice if we could store the map on the first page of the disk
table, or store it in a flat file per table. I know both of these ideas
will not work, but I am just throwing it out to see if someone has a
better idea.

I wonder if cache failures should be what drives the vacuum daemon to
vacuum a table? Sort of like, "Hey, someone is asking for free pages
for that table. Let's go find some!" That may work really well.
Another advantage of centralization is that we can record update/delete
counters per table, helping tell vacuum where to vacuum next. Vacuum
roaming around looking for old tuples seems wasteful.

Also, I suppose if we have the map act as a shared table cache (fseek
info), it may override the disadvantage of having it all centralized.

I know I am throwing out the advantages and disadvantages of
centralization, but I thought I would give out the ideas.

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message August Zajonc 2001-05-18 03:31:54 Re: Plans for solving the VACUUM problem
Previous Message mlw 2001-05-18 01:31:59 Re: Plans for solving the VACUUM problem