Re: Dynamic Partitioning using Segment Visibility Maps

From: Sam Mason <sam(at)samason(dot)me(dot)uk>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Dynamic Partitioning using Segment Visibility Maps
Date: 2008-01-03 00:41:04
Message-ID: 20080103004104.GK11262@frubble.xen.chris-lamb.co.uk
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Jan 02, 2008 at 05:56:14PM +0000, Simon Riggs wrote:
> Like it?

Sounds good. I've only given it a quick scan though. Would read-only
segments retain the same disk-level format as is currently? It seems
possible to remove the MVCC fields and hence get more tuples per page---
whether this would actually be a net performance gain/loss seems like
a difficult question question to answer, it would definitly be a
complexity increase though.

Reading this reminds me of the design of the store for a persistent
operating system called EROS. It has a very good paper[1] describing
the design (implementation and careful benchmarking thereof) that I
think could be a useful read.

A lot of your design sounds like the EROS store, with the the
"Checkpoint Area" being, in current and all previous versions of
Postgres, the only place data is stored. Data in EROS also has a "Home
Location" which is where the data ends up after a checkpoint, and sounds
somewhat like the proposed read-only.

Checkpoints here serve a similar purpose than checkpoints to PG, so the
following analogy may get a bit confusing. When you're reading the
paper I'd try and imagine the checkpoints not occurring as one event,
but spread across time as the database recognizes that data is now (or
has been marked as) read-only. The home locations would then store
only the read-only copies of the data and all the churn would (if the
recognition of read-only data works) be done in the checkpoint area.

Sam

[1] http://www.eros-os.org/papers/storedesign2002.pdf

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Brian Modra 2008-01-03 05:11:07 Re: Index performance
Previous Message Andrew Dunstan 2008-01-02 18:32:55 Re: Table rewrites vs. pending AFTER triggers