From: | Boris Köster <koester(at)x-itec(dot)de> |
---|---|
To: | Curt Sampson <cjs(at)cynic(dot)net> |
Cc: | Gunther Schadow <gunther(at)aurora(dot)regenstrief(dot)org>, pgsql-general(at)postgresql(dot)org |
Subject: | Re: Mass-Data question |
Date: | 2002-04-16 09:45:35 |
Message-ID: | 1135213696.20020416114535@x-itec.de |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
Hello Curt,
Tuesday, April 16, 2002, 5:25:25 AM, you wrote:
>> Hmm, interesting. I have similar needs.
CS> As do I. Unfortuantely, I'm not a guru. But I'll be testing out
CS> something like this in the next few weeks if all goes well. I was
CS> planning to do some fairly simple data partitioning. My initial
CS> plan is to drop the data into multiple tables across multiple
CS> servers, partitioned by date, and have a master table indicating
CS> the names of the various tables and the date ranges they cover.
Aha, interesting.
CS> The application will then deal with determining which tables the
CS> query will be spread across, construct and submit the appropriate
CS> queries (eventually in parallel, if I'm getting a lot of queries
CS> crossing multiple tables), and collate the results.
Parallel querying sounds very interesting to me. My current plan was
to do parallel writing because the hard-drives are not fast enough to
collect all the data, your idea of parallel reading is very
intersting.
I have written a C++ library to access mysql+postgresql databases. My
OS is FreeBSD, but it should work with other OSes, too I think.
Normally it sounds not very complex to do parallelized
reading/writing but getting the results in the right order that is a
problem. Maybe I could collect data parallelized from several
machines via threads, writing the content to a (new) machine (?) if the numer of rows is
not higher than x rows to avoid disk-overrun. The advantage could be
that if this works, its possible to use that feature with pgsql+mysql.
---------- ----------
rdbms1 rdbms[n]
---------- ----------
| |
| |
---------------
|
|distributed writing for logfiles or similar into databases
|
| ----------
|-------- rdbms-tmp temporary db-server (?)
| ---------- to analyze the data for parallelized
| | reading like a temporary space... ?
| |
| |---- > Customer-Access for analyzing
--------------
Machine with Memory-Queue implementation for fast reading/writing
"Collector for writing and distributing the content"
--------------
|
|
Internet
---------- ----------
client1 client[n]
---------- ----------
What do the GURUs think about this? I need this functionality within
the next 1-2 month and I could try to code it as a C++ library. If the
concept is not bogus, the only question left is if i should give out
the source for free or not, this is no solution for a home-user *gg
I have no idea.
--
Best regards,
Boris Köster mailto:koester(at)x-itec(dot)de
From | Date | Subject | |
---|---|---|---|
Next Message | CoL | 2002-04-16 09:50:11 | Re: speeding up subqueries |
Previous Message | Boris Köster | 2002-04-16 09:13:19 | Re: Mass-Data question |