Re: New FSM patch

From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Zdenek Kotala <Zdenek(dot)Kotala(at)Sun(dot)COM>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: New FSM patch
Date: 2008-09-17 14:30:47
Message-ID: 48D11497.8050303@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Zdenek Kotala wrote:
> I tested it with DTrace on Solaris 10 and 8CPUs SPARC machine. I got
> similar result as you. Main problem in your new implementation is
> locking. On small tables where FSM fits on one page clients spend about
> 3/4 time to waiting on page lock.

That in itself is not that surprising. The test case does nothing else
than exercise the FSM, so it's pretty obvious that all the time will be
spent in the FSM. And they will be serialized on the lock on the single
FSM page, like they are currently serialized on the FreeSpaceMapLock.

I think it's pretty unlikely we'd run into FSM congestion on a small
table in real life. If you're (non-HOT) updating a table at such a high
rate, it's not going to stay small for long.

> On medium tables (2level FSM) then
> InsertWal lock become significant - it takes 1/4 of waiting time. Page
> waiting takes "only" 1/3.

That's a more interesting case from scalability point of view.

For the case of a medium size table, we could implement the optimization
of a "fast-root", similar to B-tree, to speed up the searches. If the
heap is small enough that not all FSM levels are needed, we could simply
start searches from the lower levels. That should reduce the page
locking significantly. However, when searching, the upper levels are
locked in shared, not exclusive, mode, so it's not clear if that would
help with that 1/3 that's now spent in page waiting.

> Suggestions:
>
> 1) remove WAL logging. I think that FSM record should be recovered
> during processing of others WAL records (like insert, update). Probably
> only we need full page write on first modification after checkpoint.

Hmm, we don't have the infrastructure to do that, but I guess it's
doable. In case of a crash, the FSM information after recovery wouldn't
obviously be up-to-date. And the FSM information in a warm standby would
also lag behind.

One option would be to put RecordPageWithFreeSpace() calls into
heap_redo, so that the FSM would be updated based on the WAL records we
write anyway.

I think we'd still need to WAL log operations that decrease the amount
of free space on page. Otherwise, after we have partial vacuum, we might
never revisit a page, and update the FSM, even though there's usable
space on the page, leaving the space forgotten forever.

And the torn page problem would need to be handled somehow.

> 2) break lock - use only share lock for page locking and divide page for
> smaller part for exclusive locking (at least for root page)

That sounds difficult. But I don't think it's very likely to have
congestion on a single FSM page in real life. Unless the root page
becomes a bottleneck. Can you figure out how much of the lock waits come
from the upper level and root pages?

> However, your test case is too artificial.

> I'm going to run OLTP
> workload and test it with "real" workload.

Agreed. It should be possible to emulate the pattern the FSM really sees:

- tables become fuller and fuller, until no more tuples fill. Vacuum
sweeps through them periodically, adding a lot of free space to each page.
- RecordPageWithFreeSpace is only called when a tuple being inserted
doesn't fit on the previous page the backend inserted to.

However, even if these artificial tests show that the new implementation
is X times slower than the old one, the question that they can't answer
is whether it matters or not. If only say 0.01% of the time was spent in
FSM, a 10x slowdown would be acceptable. A full-blown OLTP benchmark
should give some clue on that.

Other important test cases are various bulk insert kind of tests.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Stephen Frost 2008-09-17 14:32:22 Re: Column level privileges was:(Re: Extending grant insert on tables to sequences)
Previous Message Andrew Chernow 2008-09-17 14:29:23 Re: [PATCHES] libpq events patch (with sgml docs)