Re: Performance for relative large DB

From: Chris Browne <cbbrowne(at)acm(dot)org>
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: Performance for relative large DB
Date: 2005-08-24 16:34:51
Message-ID: 607jebpa38.fsf@dba2.int.libertyrms.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

"tobbe" <tobbe(at)tripnet(dot)se> writes:
> Hi Chris.
>
> Thanks for the answer.
> Sorry that i was a bit unclear.
>
> 1) We update around 20.000 posts per night.

No surprise there; I would have been surprised to see 100/nite or
6M/nite...

> 2) What i meant was that we suspect that the DBMS called PervasiveSQL
> that we are using today is much to small. That's why we're looking for
> alternatives.
>
> Today we base our solution much on using querry-specific tables created
> at night, so instead of doing querrys direct on the "post" table (with
> 4-6M rows) at daytime, we have the data pre-aligned in several much
> smaller tables. This is just to make the current DBMS coop with our
> amount of data.
>
> What I am particulary interested in is if we can expect to run all our
> select querrys directly from the "post" table with PostgreSQL.

Given a decent set of indices, I'd expect that to work OK... Whether
4M or 6M rows, that's pretty moderate in size.

If there are specific states that rows are in which are "of interest,"
then you can get big wins out of having partial indices... Consider...

create index partial_post_status on posts where status in ('Active', 'Pending', 'Locked');
-- When processing of postings are completely finished, they wind up with 'Closed' status

We have some 'stateful' tables in our environment where the
interesting states are 'P' (where work is "pending") and 'C' (where
all the work has been completed and the records are never of interest
again except as ancient history); the partial index "where status =
'P'" winds up being incredibly helpful.

It's worth your while to dump data out from Pervasive and load it into
a PostgreSQL instance and to do some typical sorts of queries on the
PostgreSQL side.

Do "EXPLAIN ANALYZE [some select statement];" and you'll get a feel
for how PostgreSQL is running the queries.

Fiddling with indices to see how that affects things will also be a
big help.

You may find there are columns with large cardinalities (quite a lot
of unique values) where you want to improve the stats analysis via...

alter posts alter column [whatever] set statistics 100;
-- Default is 10 bins
analyze posts;
-- then run ANALYZE to update statistics

> 3) How well does postgres work with load balancing environments. Is
> it built-in?

Load balancing means too many things. Can you be more specific about
what you consider it to mean?

For Internet registry operations, we use replication (Slony-I) to
create replicas used to take particular sorts of load off the "master"
systems.

But you might be referring to something else...

For instance, connection pools, whether implemented inside
applications (everyone doing Java has one or more favorite Java
connection pool implementations) or in web servers (Apache has a DB
connection pool manager) or in an outside application (pgpool, a
C-based connection pool manager) are also sometimes used for load
balancing.
--
(reverse (concatenate 'string "gro.mca" "@" "enworbbc"))
http://www3.sympatico.ca/cbbrowne/postgresql.html
In case you weren't aware, "ad homineum" is not latin for "the user of
this technique is a fine debater." -- Thomas F. Burdick

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Rosser Schwarz 2005-08-24 16:35:01 Re: performance drop on RAID5
Previous Message Arjen van der Meijden 2005-08-24 16:32:21 Re: performance drop on RAID5