Dead Space Map version 2

From: ITAGAKI Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Dead Space Map version 2
Date: 2007-02-27 03:05:57
Message-ID: 20070227113655.6381.ITAGAKI.TAKAHIRO@oss.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-patches pgsql-performance

This is the second proposal for Dead Space Map (DSM).
Here is the previous discussion:
http://archives.postgresql.org/pgsql-hackers/2006-12/msg01188.php

I'll post the next version of the Dead Space Map patch to -patches.
I've implemented 2bits/page bitmap and new vacuum commands.
Memory management and recovery features are not done yet.

I think it's better to get DSM and HOT together. DSM is good at complex
updated cases but not at heavily updated cases. HOT has opposite aspects,
as far as I can see. I think they can cover each other.

2bits/page bitmap
-----------------

Each heap pages have 4 states for dead space map; HIGH, LOW, UNFROZEN and
FROZEN. VACUUM uses the states to reduce the number of target pages.

- HIGH : High priority to vacuum. Maybe many dead tuples in the page.
- LOW : Low priority to vacuum Maybe few dead tuples in the page.
- UNFROZEN : No dead tuples, but some unfrozen tuples in the page.
- FROZEN : No dead nor unfrozen tuples in the page.

If we do UPDATE a tuple, the original page containing the tuple is marked
as HIGH and the new page where the updated tuple is placed is marked as LOW.
When we commit the transaction, the updated tuples needs only FREEZE.
That's why the after-page is marked as LOW. However, If we rollback, the
after-page should be vacuumed, so we should mark the page LOW, not UNFROZEN.
We don't know the transaction will commit or rollback at the UPDATE.

If we combine this with the HOT patch, pages with HOT tuples are probably
marked as UNFROZEN because we don't bother vacuuming HOT tuples. They can
be removed incrementally and doesn't require explicit vacuums.

In future work, we can do index-only-scan for tuples that is in UNFROZEN or
FROZEN pages. (currently not implemented)

VACUUM commands
---------------

VACUUM now only scans the pages that possibly have dead tuples.
VACUUM ALL, a new syntax, behaves as the same as before.

- VACUUM FULL : Not changed. scans all pages and compress them.
- VACUUM ALL : Scans all pages; Do the same behavior as previous VACUUM.
- VACUUM : Scans only HIGH pages usually, but also LOW and UNFROZEN
pages on vacuums in the cases for preventing XID wraparound.

The commitment of oldest XID for VACUUM is not changed. There should not be
tuples that XIDs are older than (Current XID - vacuum_freeze_min_age) after
VACUUM. If the VACUUM can guarantee the commitment, it scans only HIGH pages.
Otherwise, it scans HIGH, LOW and UNFROZEN pages for FREEZE.

Performance issues
------------------

* Enable/Disable DSM tracking per tables
DSM requires more or less additional works. If we know specific tables
where DSM does not work well, ex. heavily updated small tables, we can
disable DSM for it. The syntax is:
ALTER TABLE name SET (dsm=true/false);

* Dead Space State Cache
The DSM management module is guarded using one LWLock, DeadSpaceLock.
Almost all accesses to DSM requires only shared lock, but the frequency
of shared lock was very high (tied with BufMappingLock) in my research.
To avoid the lock contention, I added a cache of dead space state in
BufferDesc flags. Backends see the flags first, and avoid locking if no
need to

* Agressive freezing
We will freeze tuples in dirty pages using OldestXmin but FreezeLimit.
This is for making FROZEN pages but not UNFROZEN pages as far as possible
in order to reduce works in XID wraparound vacuums.

Memory management
-----------------

In current implementation, DSM allocates a bunch of memory at start up and
we cannot modify it in running. It's maybe enough because DSM consumes very
little memory -- 32MB memory per 1TB database.

There are 3 parameters for FSM and DSM.

- max_fsm_pages = 204800
- max_fsm_relations = 1000 (= max_dsm_relations)
- max_dsm_pages = 4096000

I'm thinking to change them into 2 new paramaters. We will allocates memory
for DSM that can hold all of estimated_database_size, and for FSM 50% or
something of the size. Is this reasonable?

- estimated_max_relations = 1000
- estimated_database_size = 4GB (= about max_fsm_pages * 8KB * 2)

Recovery
--------

I've already have a recovery extension. However, it can recover DSM
but not FSM. Do we also need to restore FSM? If we don't, unreusable
pages might be left in heaps. Of cource it could be reused if another
tuple in the page are updated, but VACUUM will not find those pages.

Comments and suggestions are really appreciated.

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2007-02-27 03:06:47 Re: COMMIT NOWAIT Performance Option
Previous Message Alvaro Herrera 2007-02-27 03:05:25 Re: autovacuum next steps, take 2

Browse pgsql-patches by date

  From Date Subject
Next Message John Bartlett 2007-02-27 03:23:17
Previous Message ITAGAKI Takahiro 2007-02-27 02:10:01 Re: [HACKERS] Load distributed checkpoint

Browse pgsql-performance by date

  From Date Subject
Next Message Shane Ambler 2007-02-27 05:05:13 Re: Two hard drives --- what to do with them?
Previous Message Jeff Davis 2007-02-27 00:25:14 Re: Two hard drives --- what to do with them?