Skip site navigation (1) Skip section navigation (2)

Re: Hadoop backend?

From: Ron Mayer <rm_pg(at)cheapcomplexdevices(dot)com>
To: Paul Sheer <paulsheer(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Hadoop backend?
Date: 2009-07-22 03:29:22
Message-ID: 4A668792.7060601@cheapcomplexdevices.com (view raw or flat)
Thread:
Lists: pgsql-hackers
Paul Sheer wrote:
> Hadoop backend for PostGreSQL....

Resurrecting an old thread, it seems some guys at Yale implemented
something very similar to what this thread was discussing.

http://dbmsmusings.blogspot.com/2009/07/announcing-release-of-hadoopdb-longer.html
> >
> >It's an open source stack that includes PostgreSQL Hadoop, and Hive, along
> >with some glue between PostgreSQL and Hadoop, a catalog, a data loader, and
> >an interface that accepts queries in MapReduce or SQL and generates query
> >plans that are processed partly in Hadoop and partly in different PostgreSQL
> >instances spread across many nodes in a shared-nothing cluster of machines.

Their detailed paper is here:

  http://db.cs.yale.edu/hadoopdb/hadoopdb.pdf

According to the paper, it scales very well.


> A problem that my client has, and one that I come across often,
> is that a database seems to always be associated with a particular
> physical machine, a physical machine that has to be upgraded,
> replaced, or otherwise maintained.
> 
> Even if the database is replicated, it just means there are two or
> more machines. Replication is also a difficult thing to properly
> manage.
> 
> With a distributed data store, the data would become a logical
> object - no adding or removal of machines would affect the data.
> This is an ideal that would remove a tremendous maintenance
> burden from many sites ---- well, at least the one's I have worked
> at as far as I can see.
> 
> Does anyone know of plans to implement PostGreSQL over Hadoop?
> 
> Yahoo seems to be doing this:
>       http://glinden.blogspot.com/2008/05/yahoo-builds-two-petabyte-postgresql.html
> 
> But they store tables column-ways for their performance situation.
> If one is doing a lot of inserts I don't think this is most efficient - ?
> 
> Has Yahoo put the source code for their work online?
> 
> Many thanks for any pointers.
> 
> -paul
> 


In response to

pgsql-hackers by date

Next:From: Tom LaneDate: 2009-07-22 03:35:30
Subject: Re: CommitFest 2009-07 - End of Week 1
Previous:From: Robert HaasDate: 2009-07-22 03:18:42
Subject: CommitFest 2009-07 - End of Week 1

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group