Re: Including Snapshot Info with Indexes

From: "Gokulakannan Somasundaram" <gokul007(at)gmail(dot)com>
To: "Trevor Talbot" <quension(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Including Snapshot Info with Indexes
Date: 2007-10-14 16:30:20
Message-ID: 9362e74e0710140930p5a68fb9fi13cdb08a52413f7b@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-patches

On 10/14/07, Trevor Talbot <quension(at)gmail(dot)com> wrote:
>
> On 10/14/07, Gokulakannan Somasundaram <gokul007(at)gmail(dot)com> wrote:
>
> > http://www.databasecolumn.com/2007/09/one-size-fits-all.html
>
> > > > The Vertica database(Monet is a open source version with the same
> > > > principle) makes use of the very same principle. Use more disk
> space,
> > > > since they are less costly and optimize the data warehousing.
>
> > What i meant there was, it has duplicated storage of certain columns of
> the
> > table. A table with more than one projection always needs more space,
> than a
> > table with just one projection. By doing this they are reducing the
> number
> > of disk operations. If they are duplicating columns of data to avoid
> reading
> > un-necessary information, we are duplicating the snapshot information to
> > avoid going to the table.
>
> Was this about Vertica or MonetDB? I saw that article a while ago,
> and I didn't see anything that suggested Vertica duplicated data, just
> that it organized it differently on disk. What are you seeing as
> being duplicated?

Hi Trevor,
This is a good paper to read about the basics of
Column-oriented databases.
http://db.lcs.mit.edu/projects/cstore/vldb.pdf
If you goto the Section 2 - Data Model. He has shown the data model, with a
sample EMP table.

The example shows that EMP table contains four columns - Name, Age, Dept,
Salary
>From this table, projections are being formed - (In the paper, they have
shown the creation of four projections for Example 1)
EMP1 (name, age)
EMP2 (dept, age, DEPT.floor)
EMP3 (name, salary)
DEPT1(dname, floor)

As you can see, the same column information gets duplicated in different
projections.
The advantage is that if a query is around name and age, it need not skim
around other details. But the storage requirements go high, since there is
redundancy. As you may know, if you increase data redundancy, it will help
selects at the cost of inserts, updates and deletes.

This is what i was trying to say.

Thanks,
Gokul.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2007-10-14 17:18:57 Re: Back-patch support for python 2.5?
Previous Message Gokulakannan Somasundaram 2007-10-14 16:20:48 Re: Including Snapshot Info with Indexes

Browse pgsql-patches by date

  From Date Subject
Next Message Tom Lane 2007-10-14 18:31:11 Updated patch for tsearch contrib examples
Previous Message Gokulakannan Somasundaram 2007-10-14 16:20:48 Re: Including Snapshot Info with Indexes