Re: multi billion row tables: possible or insane?

From: Alan Stange <stange(at)rentec(dot)com>
To: "Vig, Sandor (G/FI-2)" <Sandor(dot)Vig(at)audi(dot)hu>
Cc: "'John Arbash Meinel'" <john(at)arbash-meinel(dot)com>, "'pgsql-performance(at)postgresql(dot)org'" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: multi billion row tables: possible or insane?
Date: 2005-03-01 15:57:54
Message-ID: 42249102.2060700@rentec.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Isn't that 385 rows/second. Presumably one can insert more than one
row in a transaction?

-- Alan

Vig, Sandor (G/FI-2) wrote:

>385 transaction/sec?
>
>fsync = false
>
>risky but fast.
>
>-----Original Message-----
>From: pgsql-performance-owner(at)postgresql(dot)org
>[mailto:pgsql-performance-owner(at)postgresql(dot)org]On Behalf Of John Arbash
>Meinel
>Sent: Tuesday, March 01, 2005 4:19 PM
>To: Ramon Bastiaans
>Cc: pgsql-performance(at)postgresql(dot)org
>Subject: Re: [PERFORM] multi billion row tables: possible or insane?
>
>
>Ramon Bastiaans wrote:
>
>
>
>>Hi all,
>>
>>I am doing research for a project of mine where I need to store
>>several billion values for a monitoring and historical tracking system
>>for a big computer system. My currect estimate is that I have to store
>>(somehow) around 1 billion values each month (possibly more).
>>
>>
>>
>If you have that 1 billion perfectly distributed over all hours of the
>day, then you need 1e9/30/24/3600 = 385 transactions per second.
>
>Which I'm pretty sure is possible with postgres, you just need pretty
>beefy hardware. And like Jeff said, lots of disks for lots of IO.
>Like a quad opteron, with 16GB of ram, and around 14-20 very fast disks.
>raid10 not raid5, etc. To improve query performance, you can do some
>load balancing by having replication machines by using Slony.
>
>Or if you can do batch processing, you could split up the work into a
>few update machines, which then do bulk updates on the master database.
>This lets you get more machines into the job, since you can't share a
>database across multiple machines.
>
>
>
>>I was wondering if anyone has had any experience with these kind of
>>big numbers of data in a postgres sql database and how this affects
>>database design and optimization.
>>
>>
>>
>Well, one of the biggest things is if you can get bulk updates, or if
>clients can handle data being slightly out of date, so you can use
>cacheing. Can you segregate your data into separate tables as much as
>possible? Are your clients okay if aggregate information takes a little
>while to update?
>
>One trick is to use semi-lazy materialized views to get your updates to
>be fast.
>
>
>
>>What would be important issues when setting up a database this big,
>>and is it at all doable? Or would it be a insane to think about
>>storing up to 5-10 billion rows in a postgres database.
>>
>>
>
>I think you if you can design the db properly, it is doable. But if you
>have a clients saying "I need up to the second information on 1 billion
>rows", you're never going to get it.
>
>
>
>>The database's performance is important. There would be no use in
>>storing the data if a query will take ages. Query's should be quite
>>fast if possible.
>>
>>
>>
>Again, it depends on the queries being done.
>There are some nice tricks you can use, like doing a month-by-month
>partitioning (if you are getting 1G inserts, you might want week-by-week
>partitioning), and then with a date column index, and a union all view
>you should be able to get pretty good insert speed, and still keep fast
>*recent* queries. Going through 1billion rows is always going to be
>expensive.
>
>
>
>>I would really like to hear people's thoughts/suggestions or "go see a
>>shrink, you must be mad" statements ;)
>>
>>Kind regards,
>>
>>Ramon Bastiaans
>>
>>
>
>I think it would be possible, but there are a lot of design issues with
>a system like this. You can't go into it thinking that you can design a
>multi billion row database the same way you would design a million row db.
>
>John
>=:->
>
>
>The information transmitted is intended only for the person or entity to
>which it is addressed and may contain confidential and/or privileged
>material. Any review, retransmission, dissemination or other use of, or
>taking of any action in reliance upon, this information by persons or
>entities other than the intended recipient is prohibited. If you received
>this in error, please contact the sender and delete the material from any
>computer.
>
>---------------------------(end of broadcast)---------------------------
>TIP 2: you can get off all lists at once with the unregister command
> (send "unregister YourEmailAddressHere" to majordomo(at)postgresql(dot)org)
>
>

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Markus Schaber 2005-03-01 16:26:48 Re: multi billion row tables: possible or insane?
Previous Message Andras Kadinger 2005-03-01 15:54:42 Re: multi billion row tables: possible or insane?