Re: multi billion row tables: possible or insane?

From: Alex Turner <armtuk(at)gmail(dot)com>
To: "Vig, Sandor (G/FI-2)" <Sandor(dot)Vig(at)audi(dot)hu>
Cc: John Arbash Meinel <john(at)arbash-meinel(dot)com>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: multi billion row tables: possible or insane?
Date: 2005-03-05 00:15:55
Message-ID: 33c6269f05030416151b826a9c@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Not true - with fsync on I get nearly 500 tx/s, with it off I'm as
high as 1600/sec with dual opteron and 14xSATA drives and 4GB RAM on a
3ware Escalade. Database has 3 million rows.

As long as queries use indexes, multi billion row shouldn't be too
bad. Full table scan will suck though.

Alex Turner
netEconomist

On Tue, 1 Mar 2005 16:40:29 +0100, Vig, Sandor (G/FI-2)
<Sandor(dot)Vig(at)audi(dot)hu> wrote:
> 385 transaction/sec?
>
> fsync = false
>
> risky but fast.
>
> -----Original Message-----
> From: pgsql-performance-owner(at)postgresql(dot)org
> [mailto:pgsql-performance-owner(at)postgresql(dot)org]On Behalf Of John Arbash
> Meinel
> Sent: Tuesday, March 01, 2005 4:19 PM
> To: Ramon Bastiaans
> Cc: pgsql-performance(at)postgresql(dot)org
> Subject: Re: [PERFORM] multi billion row tables: possible or insane?
>
> Ramon Bastiaans wrote:
>
> > Hi all,
> >
> > I am doing research for a project of mine where I need to store
> > several billion values for a monitoring and historical tracking system
> > for a big computer system. My currect estimate is that I have to store
> > (somehow) around 1 billion values each month (possibly more).
> >
> If you have that 1 billion perfectly distributed over all hours of the
> day, then you need 1e9/30/24/3600 = 385 transactions per second.
>
> Which I'm pretty sure is possible with postgres, you just need pretty
> beefy hardware. And like Jeff said, lots of disks for lots of IO.
> Like a quad opteron, with 16GB of ram, and around 14-20 very fast disks.
> raid10 not raid5, etc. To improve query performance, you can do some
> load balancing by having replication machines by using Slony.
>
> Or if you can do batch processing, you could split up the work into a
> few update machines, which then do bulk updates on the master database.
> This lets you get more machines into the job, since you can't share a
> database across multiple machines.
>
> > I was wondering if anyone has had any experience with these kind of
> > big numbers of data in a postgres sql database and how this affects
> > database design and optimization.
> >
> Well, one of the biggest things is if you can get bulk updates, or if
> clients can handle data being slightly out of date, so you can use
> cacheing. Can you segregate your data into separate tables as much as
> possible? Are your clients okay if aggregate information takes a little
> while to update?
>
> One trick is to use semi-lazy materialized views to get your updates to
> be fast.
>
> > What would be important issues when setting up a database this big,
> > and is it at all doable? Or would it be a insane to think about
> > storing up to 5-10 billion rows in a postgres database.
>
> I think you if you can design the db properly, it is doable. But if you
> have a clients saying "I need up to the second information on 1 billion
> rows", you're never going to get it.
>
> >
> > The database's performance is important. There would be no use in
> > storing the data if a query will take ages. Query's should be quite
> > fast if possible.
> >
> Again, it depends on the queries being done.
> There are some nice tricks you can use, like doing a month-by-month
> partitioning (if you are getting 1G inserts, you might want week-by-week
> partitioning), and then with a date column index, and a union all view
> you should be able to get pretty good insert speed, and still keep fast
> *recent* queries. Going through 1billion rows is always going to be
> expensive.
>
> > I would really like to hear people's thoughts/suggestions or "go see a
> > shrink, you must be mad" statements ;)
> >
> > Kind regards,
> >
> > Ramon Bastiaans
>
> I think it would be possible, but there are a lot of design issues with
> a system like this. You can't go into it thinking that you can design a
> multi billion row database the same way you would design a million row db.
>
> John
> =:->
>
> The information transmitted is intended only for the person or entity to
> which it is addressed and may contain confidential and/or privileged
> material. Any review, retransmission, dissemination or other use of, or
> taking of any action in reliance upon, this information by persons or
> entities other than the intended recipient is prohibited. If you received
> this in error, please contact the sender and delete the material from any
> computer.
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: you can get off all lists at once with the unregister command
> (send "unregister YourEmailAddressHere" to majordomo(at)postgresql(dot)org)
>

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Tom Lane 2005-03-05 04:33:42 Re: MAIN vs. PLAIN
Previous Message Jim C. Nasby 2005-03-04 22:05:07 Re: multi billion row tables: possible or insane?