Re: Hadoop backend?

From: Hans-Jürgen Schönig <postgres(at)cybertec(dot)at>
To: pi(dot)songs(at)gmail(dot)com
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Hadoop backend?
Date: 2009-02-22 16:39:02
Message-ID: 136182E5-BC7E-4AEB-A2E8-4C225B2F9095@cybertec.at
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

hi ...

i think the easiest way to do this is to simply add a mechanism to
functions which allows a function to "stream" data through.
it would basically mean losing join support as you cannot "read data
again" in a way which is good enough good enough for joining with the
function providing the data from hadoop.

hannu ( I think) brought up some concept as well some time ago.

i think a straight forward implementation would not be too hard.

best regards,

hans

On Feb 22, 2009, at 3:37 AM, pi song wrote:

> 1) Hadoop file system is very optimized for mostly read operation
> 2) As of a few months ago, hdfs doesn't support file appending.
>
> There might be a bit of impedance to make them go together.
>
> However, I think it should a very good initiative to come up with
> ideas to be able to run postgres on distributed file system (doesn't
> have to be specific hadoop).
>
> Pi Song
>
> On Sun, Feb 22, 2009 at 7:17 AM, Paul Sheer <paulsheer(at)gmail(dot)com>
> wrote:
> Hadoop backend for PostGreSQL....
>
> A problem that my client has, and one that I come across often,
> is that a database seems to always be associated with a particular
> physical machine, a physical machine that has to be upgraded,
> replaced, or otherwise maintained.
>
> Even if the database is replicated, it just means there are two or
> more machines. Replication is also a difficult thing to properly
> manage.
>
> With a distributed data store, the data would become a logical
> object - no adding or removal of machines would affect the data.
> This is an ideal that would remove a tremendous maintenance
> burden from many sites ---- well, at least the one's I have worked
> at as far as I can see.
>
> Does anyone know of plans to implement PostGreSQL over Hadoop?
>
> Yahoo seems to be doing this:
> http://glinden.blogspot.com/2008/05/yahoo-builds-two-petabyte-postgresql.html
>
> But they store tables column-ways for their performance situation.
> If one is doing a lot of inserts I don't think this is most
> efficient - ?
>
> Has Yahoo put the source code for their work online?
>
> Many thanks for any pointers.
>
> -paul
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>

--
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: www.postgresql-support.de

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Martin Pihlak 2009-02-22 18:23:28 Re: some broken on pg_stat_user_functions
Previous Message Pavel Stehule 2009-02-22 15:43:07 Re: some broken on pg_stat_user_functions