Skip site navigation (1) Skip section navigation (2)

Re: Database conversion woes...

From: Christopher Browne <cbbrowne(at)acm(dot)org>
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: Database conversion woes...
Date: 2004-02-03 19:59:04
Message-ID: 60y8rkkk8n.fsf@dev6.int.libertyrms.info (view raw or flat)
Thread:
Lists: pgsql-performance
kevin(at)valenfor(dot)com ("Kevin Carpenter") writes:
> I am doing a massive database conversion from MySQL to Postgresql for a
> company I am working for.  This has a few quirks to it that I haven't
> been able to nail down the answers I need from reading and searching
> through previous list info.
>
> For starters, I am moving roughly 50 seperate databases which each
> one represents one of our clients and is roughly 500 megs to 3 gigs
> in size.  Currently we are using the MySQL replication, and so I am
> looking at Mammoths replicator for this one.  However I have seen it
> only allows on DB to be replicated at a time.  With the size of each
> single db, I don't know how I could put them all together under one
> roof, and if I was going to, what are the maximums that Postgres can
> handle for tables in one db?  We track over 2 million new points of
> data (records) a day, and are moving to 5 million in the next year.

I'll be evasive about replication, because the answers are pretty
painful :-(, but as for the rest of it, nothing about this sounds
challenging.

There is room for debate here as to whether you should have:
 a) One postmaster and many database instances, 
 2) One postmaster, one (or a few) database instances, and do the
    client 'split' via schemas, or
 iii) Several postmasters, many database instances.

Replication will tend to work best with scenario 2), which minimizes
the number of connections that are needed to manage replication;
that's definitely a factor worth considering.

It is also possible for it to be worthwhile to spread vastly differing
kinds of activity across different backends so that they can have
separate buffer caches.  If all the activity is shared across one
postmaster, that means it is all shared across one buffer cache, and
there are pathological situations that are occasionally observed in
practice where one process will be "trashing" the shared cache,
thereby injuring performance for all other processes using that back
end.  In such a case, it may be best to give the "ill-behaved" process
its own database instance with a small cache that it can thrash on
without inconveniencing others.

Jan Wieck is working on some improvements for buffer management in 7.5
that may improve the situation vis-a-vis buffering, but that is
certainly not something ready to deploy in production just yet.

> Second what about the physical database size, what are the limits
> there?  I have seen that it was 4 gig on Linux from a 2000 message,
> but what about now?  Have we found way's past that?

There's NO problem with having enormous databases now; each table is
represented as one or more files (if you break a size barrier, oft
configured as 1GB, it creates an "extent" and extends into another
file), and for there to be problems with this, the problems would be
_really crippling_ OS problems.
-- 
(format nil "~S(at)~S" "cbbrowne" "cbbrowne.com")
http://www3.sympatico.ca/cbbrowne/linuxxian.html
"We come to bury DOS, not to praise it."
-- Paul Vojta <vojta(at)math(dot)berkeley(dot)edu>, paraphrasing a quote of
Shakespeare

In response to

pgsql-performance by date

Next:From: Anjan DaveDate: 2004-02-03 20:58:22
Subject: Compile Vs RPMs
Previous:From: Richard HuxtonDate: 2004-02-03 19:29:52
Subject: Re: Database conversion woes...

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group