Re: A more general approach (Re: Data archiving/warehousing idea)

From: Hannu Krosing <hannu(at)skype(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Gavin Sherry <swm(at)alcove(dot)com(dot)au>, Chris Dunlop <chris(at)onthe(dot)net(dot)au>, pgsql-hackers(at)postgresql(dot)org, Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject: Re: A more general approach (Re: Data archiving/warehousing idea)
Date: 2007-02-02 10:26:49
Message-ID: 1170412010.3989.24.camel@localhost.localdomain
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Ühel kenal päeval, N, 2007-02-01 kell 12:31, kirjutas Tom Lane:
> Hannu Krosing <hannu(at)skype(dot)net> writes:
> > A more radical variation of the "restricted-use archive table" approach
> > is storing all tuple visibility info in a separate file.
> > At first it seems to just add overhead, but for lots (most ? ) usecases
> > the separately stored visibility should be highly compressible, so for
> > example for bulk-loaded tables you could end up with one bit per page
> > saying that all tuples on this page are visible.
>
> The more you compress, the slower and more complicated it will be to
> access the information. I'd put my money on this being a net loss in
> the majority of scenarios.

define "majority" :)

In real life it is often faster to access compressed information,
especially if it is stored in something like trie where compression and
indeked access are the same thing.

the most gain will of course come from bulk-loaded data, where the
"compressed" representation can just say something like "pages 1 to
200000 are all visible starting from transaction 5000 and so is first
half of page 200001, second half of 200001 and pages up to 250000 are
visible from trx 6000.

In this case the visibility info will always stay in L1 cache and thus
be really fast, maybe even free if we account for cache reloads and
such.

But it may be better to still have a bitmap there for sake of simplicity
and have some of it be accessible from L2 cache (200000 pages times say
2 bits is still only 100kB bitmap for 1.6GB of data).

Of course there are cases where this approach is worse, sometimes much
woorse, than current one, but the possibility of independently
compressing visibility info and making some types of VACUUM vastly
cheaper may make it a net win in several cases.

Also, for higly dynamic tables a separate visibility heap might also
speed up bitmap scans, as access to heap happens only for visible
tuples.

This can also be one way to get rid of need to write full data tuples
thrice - first the original write, then commit bits and then deleted
bits. instead we can just write the bits once for a whole set of tuples.

Usually most of visibility info can be thrown out quite soon, as the
active transaction window advances, so the first level of "compression"
is just thtrowing out cmin/cmax and setting commit bit, then setting the
tuple to just "visible".

--
----------------
Hannu Krosing
Database Architect
Skype Technologies OÜ
Akadeemia tee 21 F, Tallinn, 12618, Estonia

Skype me: callto:hkrosing
Get Skype for free: http://www.skype.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Csaba Nagy 2007-02-02 11:01:54 Re: Referential Integrity and SHARE locks
Previous Message Richard Huxton 2007-02-02 10:25:37 Re: Referential Integrity and SHARE locks