Re: When should I worry?

From: Tom Allison <tom(at)tacocat(dot)net>
To: Joe Conway <mail(at)joeconway(dot)com>
Cc: Bill Moran <wmoran(at)potentialtech(dot)com>, pgsql-general(at)postgresql(dot)org
Subject: Re: When should I worry?
Date: 2007-06-10 22:25:59
Message-ID: 4CB34EF5-73AC-4455-AC4B-781B9D91D947@tacocat.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general


On Jun 10, 2007, at 2:14 PM, Joe Conway wrote:

>
> Bill Moran wrote:
>> Tom Allison <tom(at)tacocat(dot)net> wrote:
>>>
>>> If the user base gets to 100 or more, I'll be hitting a billion
>>> rows before too long. I add about 70,000 rows per user per day.
>>> At 100 users this is 7 million rows per day. I'll hit a billion
>>> in 142 days, call it six months for simplicity.
>>>
>>> The table itself is small (two columns: bigint, int) but I'm
>>> wondering when I'll start to hit a knee in performance and how I
>>> can monitor that. I know where I work (day job) they have Oracle
>>> tables with a billion rows that just plain suck. I don't know
>>> if a billion is bad or if the DBA's were not given the right
>>> opportunity to make their tables work.
>>>
>>> But if they are any indication, I'll feeling some hurt when I
>>> exceed a billion rows. Am I going to just fold up and die in six
>>> months?
>
> Alot depends on your specific use case.
>
> - Will you be just storing the data for archival purposes, or
> frequently querying the data?
>
> - If you need to run queries, are they well bounded to certain
> subsets of the data (e.g. a particular range of time for a
> particular user) or are they aggregates across the entire billion
> rows?
>
> - Is the data temporal in nature, and if so do you need to purge it
> after some period of time?
>
> As an example, I have an application with temporal data, that needs
> periodic purging, and is typically queried for small time ranges
> (tens of minutes). We have set up partitioned tables (partitioned
> by date range and data source -- akin to your users) using
> constraint exclusion that contain 3 or 4 billion rows (total of all
> partitions), and we have no problem at all with performance. But I
> suspect that if we needed to do an aggregate across the entire
> thing it would not be particularly fast ;-)
>
>> Why not just create a simulation of 100 users and run it as hard
>> as your
>> can until it starts to degrade? Then you'll have some real-world
>> experience
>> to tell you how much you can handle.
>
> This is good advice. Without much more detail, folks on the list
> won't be able to help much, but a with simulation such as this you
> can answer your own question...
>
> Joe
>

Good questions. I guess there are two answers. There are times when
I will want aggregate data and I'm not as concerned about the
execution time.
But there are other queries that are part of the application design.
These are always going to be of a type where I know a single specific
primary key value and I want to find all the rows that are related.

First table has a row of
idx serial primary key
Third table has a row of
idx bigserial primary key

and a second table (the billion row table) consistes of two rows:
first_idx integer not null references first(idx) on delete cascade,
third_idx bigint not null references third(idx) on delete cascade,
constraint pkey_first_third primary key (first_idx, third_idx)

The common query will be:

select t.string
from first f, second s, third t
where f.idx = s.first_idx
and s.third_idx = t.idx
and f.idx = 4 (or whatever...).

So, I think the answer is that the data isn't going to be temporal or
otherwise segragated or subsets.
I'll assume this is a lead in for partitions?
The data will be queried very frequently. Probably plan on a query
every 10 seconds and I don't know what idx ranges will be involved.
Would it be possible to partition this by the first_idx value? An
improvement?

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Jim Nasby 2007-06-11 03:29:00 Re: New Live CD needed
Previous Message Michael Glaesemann 2007-06-10 20:35:53 Re: automatically execute a function each day