Re: Free Space Map data structure

From: Hannu Krosing <hannu(at)krosing(dot)net>
To: Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>
Cc: Heikki Linnakangas <heikki(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Free Space Map data structure
Date: 2008-04-09 12:49:58
Message-ID: 1207745398.7189.21.camel@huvostro
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


On Wed, 2008-04-09 at 10:17 +0530, Pavan Deolasee wrote:
> On Wed, Apr 9, 2008 at 12:22 AM, Heikki Linnakangas
> <heikki(at)enterprisedb(dot)com> wrote:
>
> >
> > Well, if you add the higher levels, we're no longer talking about O(1), but
> > O(log n) (for a pretty steep logarithm), just like my proposal.
> >
>
> For updates, I would still call it O(1) because the number of levels is limited
> and a very small number.
>
>
> >
> > It's actually not that orthogonal. You describe it as a hierarchical
> > bitmap, but it's essentially the same idea as the binary heap/tree, just
> > with way more than 2 children per parent.
> >
>
> Yes, I agree.
>
> Btw, I agree with Tom's point about the lock contention on the higher levels
> for FSM updates. What we can do is during normal operations, FSM pages
> are updated with a SHARE lock - so the information can best be considered
> only as hint. Hopefully, if we are only doing bit flipping, we won't go wrong
> often. And periodically, VACUUM would correct any mistakes in FSM info.

Sabing 1 byte is an atomic op so if we update going from branches to
root and read from root to branches, then we can probably detect
inconsistencies when reading and then re-read.

What I'm more concerned about is L1 cache swapping between several
processors/cores, which could lead to content switch storms we used to
have in locking.

anyway, one thing to do for sure is to avoid storing info for new/last
page(s) appended by any process before that process is done filling it,
as that would lead to excessive up-tree max_free updates.

BTW, I'm pretty sure I have figured out the method for storing minimal
required binary tree as an array, where adding each page adds exactly
one upper node. The node added is often not the immediate parent, but is
always the one needed for covering all nodes.

I just have to write it up in an understandable way and then you all can
look at it and tell if it is something well-known from Knuth or Date ;)

--------------
Hannu

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Ashish Sharma 2008-04-09 13:03:24 SQL fast in PSQL, very slow using MS.NET driver
Previous Message Zeugswetter Andreas OSB SD 2008-04-09 12:43:31 Re: SET TRANSACTION not compliant with SQL:2003