Re: WORM and Read Only Tables (v0.1)

From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: Josh Berkus <fuzzy(at)agliodbs(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: WORM and Read Only Tables (v0.1)
Date: 2007-12-12 08:34:09
Message-ID: 1197448449.4255.1527.camel@ebony.site
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, 2007-12-11 at 20:30 -0800, Josh Berkus wrote:
> Simon,
>
> > Use Case: VLDB with tons of (now) read only data, some not. Data needs
> > to be accessible, but data itself is rarely touched, allowing storage
> > costs to be minimised via a "storage hierarchy" of progressively cheaper
> > storage.
>
> There's actually 2 cases to optimize for:
> 1) write-once-read-many (WORM)
> 2) write-once-read-seldom (WORS)
>
> The 2nd case is becoming extremely popular due to the presence of
> government-mandated records databases. For example, I'm currently working on
> one call completion records database which will hold 75TB of data, of which
> we expect less than 1% to *ever* be queried.

Well, that's exactly the use case I'm writing for. I called that an
archival data store in my post on VLDB Features.

WORM is a type of storage that might be used, so it would be somewhat
confusing if we use it as the name of a specific use case.

Getting partitioning/read-only right will allow 70+TB of that to be on
tape or similar, which with compression can be reduced to maybe 20TB? I
don't want to promise any particular compression ratio, but it will make
a substantial difference, as I'm sure you realise.

> One of the other things I'd like to note is that for WORM, conventional
> storage is never going to approach column-store DBs for general performance.
> So, should we be working on incremental improvements like the ones you
> propose, or should we be working on integrating a c-store into PostgreSQL on
> a per-table basis?

What I'm saying is that there are some features that all VLDBs need. If
we had a column store DB we would still need partitioning as well or the
data structures would become unmanageable. Plus partitioning can allow
the planner to avoid de-archiving/spinning up data and help reduce
storage costs.

Radical can be good, but it can take more time also. I dare say it would
be harder for the community to accept also. So I look for worthwhile
change in acceptable size chunks.

--
Simon Riggs
2ndQuadrant http://www.2ndQuadrant.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Gregory Stark 2007-12-12 08:55:24 Re: Slow PITR restore
Previous Message Bruce Momjian 2007-12-12 07:06:39 Re: There's random access and then there's random access