Re: Data archiving/warehousing idea

From: "Simon Riggs" <simon(at)2ndquadrant(dot)com>
To: "Chris Dunlop" <chris(at)onthe(dot)net(dot)au>
Cc: <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Data archiving/warehousing idea
Date: 2007-02-01 11:25:40
Message-ID: 1170329141.3681.501.camel@silverbirch.site
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, 2007-02-01 at 15:03 +1100, Chris Dunlop wrote:

> > A different approach discussed earlier involves greatly
> > restricting the way in which the table is used. This table
> > could only be written to if an exclusive lock is held; on
> > error or ABORT, the table is truncated.
>
> You're talking about the "no-WAL" concept? Not quite the same
> thing I think, but perhaps complimentary to the ARCHIVE idea: I
> wouldn't expect an ARCHIVE table to need to generate any WAL
> entries as it would be read only.

Setting an option to reduce the size of the row headers needs to be done
before its loaded, not after. If you mark a table as APPEND-only it
would be possible to save 4 bytes off the row header.

Saving a further 8-bytes off the row header by dropping the xmin and
commandid fields isn't a very workable solution since those fields
provide visibility within a transaction and across transactions. You'd
end up with a table that could only be loaded by one transaction and
only touched by one command within that transaction. That would require
bookkeeping on the relation itself (in pg_class) and an
AccessExclusiveLock. In those circumstances you would be able to save on
writing WAL as well. You'd be basically saying that it could only be
loaded by a special utility and would be read-only once loaded.

I'm not crazy about those ideas, even though I think I suggested them
some time back. They're very special case and would probably require
many functions to handle multiple cases, so that additional complexity
and cost would effect all users.

My main doubt relates to the data you're storing. If the events you're
interested in have so little information associated with them that they
are a thin table then storing them at all is questionable. I've been
involved at the design stage of a number of data warehouses and the
amount of data eventually stored is typically < 10% of the number people
first thought of, with number of rows decreasing drastically and the
number of columns increasing slightly as people try to gain additional
value from their data storage. So row overhead should be less of a
concern.

It is possible to save on WAL by doing COPY LOCK or using a table with
different resilience characteristics, both ideas are already on the
TODO. There are also better, less intrusive ways of reducing data volume
and improving load performance.

--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Magnus Hagander 2007-02-01 11:27:05 Re: pg_restore fails with a custom backup file
Previous Message Tino Wildenhain 2007-02-01 11:15:48 Re: "May", "can", "might"