Quick Links

Re: A Silly Idea for Vertically-Oriented Databases

From:	Avery Payne <apayne(at)pcfruit(dot)com>
To:	simon(at)2ndquadrant(dot)com
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: A Silly Idea for Vertically-Oriented Databases
Date:	2007-09-10 22:16:10
Message-ID:	46E5C22A.3010107@pcfruit.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
</head>
<body bgcolor="#ffffff" text="#000000">
<div id="pgContentWrap">
<div id="txtArchives">
<pre>>ISTM we would be able to do this fairly well if we implemented
>Index-only columns. i.e. columns that don't exist in the heap, only in
>an index.

>Taken to the extreme, all columns could be removed from the heap and
>placed in an index(es). Only the visibility information would remain on
>the heap.

So, let me understand this correctly - you want to index the columns and
use the index to reconstruct the data? Some kind of "implicit" reconstruction?

>Doing this per column would be a big win over vertical databases
>since AFAIK they *have* to do this to *every* column, even if it is not
>beneficial to do so.

<snip>

I was thinking about something a little more crude - each column being a free-standing
table, but being "viewed" by the client as a single entity, a kind of "data federation".
The idea was that the existing storage mechanism wouldn't be altered, and
with a little slight-of-hand, we could extend the mechanism without hampering things
like clustering, future extensions, optimizations, etc. No changes to MVCC would be
needed, because it would above and through it.

The idea was that for a "wide" table you target only a few columns, so you could
sequential read without the penalty of having to seek to a new offset for each record.
Instead of processing all the columns, you process a single column, which means less
data to read for the same number of records. That gain starts to slope off
when you specify more and more columns from the table.

Throw in selective indexing (similar to what you were talking about) and suddenly we can reclaim
some of that lost speed. We'll make a kind of "compressed index", where the key turns into
a hash that points to (is attached to?) a bucket, and the bucket contains the offset of all the
records that relate to that key. Other tricks can be employed, such as run-length encoding
entire ranges of offsets, etc. to keep this really really small. Really small = faster and faster
to read, using more CPU than I/O. And I/O is still more of an issue than CPU at this point.

Then again, if you ditch the column altogether and use a "compressed index" to reconstruct
data implicitly, now we're close to what you were talking about (assuming I understand you
correctly and also assuming that PostgreSQL doesn't already do this with indexes). So, if the
column is indexed, then maybe split it off into a "compressed index", and if not, keep it in the main
table outright?

I guess I really need to think about this a bit more before I start delving into code.

>I thought maybe we can call it COAST, Column-oriented attribute storage technique, :-)

I like it. :-)<a rel="nofollow" href="http://www.2ndQuadrant.com"></a> I just wish I would have read this before applying for a project name
at pgfoundry, the current proposal is given as "pg-cstore".
</pre>
</div>
</div>
</body>
</html>

Attachment	Content-Type	Size
unknown_filename	text/html	3.4 KB

Responses

Re: A Silly Idea for Vertically-Oriented Databases at 2007-09-10 22:44:45 from Alvaro Herrera

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Alvaro Herrera	2007-09-10 22:44:45	Re: A Silly Idea for Vertically-Oriented Databases
Previous Message	Alvaro Herrera	2007-09-10 22:07:00	Re: reindexdb hangs