Re: large database

From: Chris Travers <chris(dot)travers(at)gmail(dot)com>
To: Mihai Popa <mihai(at)lattica(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: large database
Date: 2012-12-11 10:40:59
Message-ID: CAKt_ZfsY3qf+mR5tDrV_Oj_4fa+MrM5q2rNLRK_oNP92df+C-A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Mon, Dec 10, 2012 at 12:26 PM, Mihai Popa <mihai(at)lattica(dot)com> wrote:

> Hi,
>
> I've recently inherited a project that involves importing a large set of
> Access mdb files into a Postgres or MySQL database.
> The process is to export the mdb's to comma separated files than import
> those into the final database.
> We are now at the point where the csv files are all created and amount
> to some 300 GB of data.
>

Ok, this doesn't sound like anything that would require MySQL-only
featuers.

>
> I would like to get some advice on the best deployment option.
>
> First, the project has been started using MySQL. Is it worth switching
> to Postgres and if so, which version should I use?
>

Probably. I say probably because there are costs. See below.

>
> Second, where should I deploy it? The cloud or a dedicated box?
>

The cloud is best for low-end db's. Once you have high disk I/O it breaks
down. Large db's suggest significant disk I/O.

>
> Amazon seems like the sensible choice; you can scale it up and down as
> needed and backup is handled automatically.
>

Keep in mind there is a huge complexity cost there, when it comes to
maintaining adequate performance.

> I was thinking of an x-large RDS instance with 10000 IOPS and 1 TB of
> storage. Would this do, or will I end up with a larger/ more expensive
> instance?
>
> Alternatively I looked at a Dell server with 32 GB of RAM and some
> really good hard drives. But such a box does not come cheap and I don't
> want to keep the pieces if it doesn't cut it
>

Without knowing your expected usage pattern, estimating memory needed. I
would however suggest the first thing to do is to estimate the frequently
used set of memory from normal operations and then multiply that by a
reasonable factor to give you a reasonable assurance of the data you want
being in cache.

Secondly you want to look at expected number of parallel queries and plan
number of processors around that.

As for costs of switching there are some other things you need to be aware
of:

1: MySQL and PostgreSQL have completely different assumptions about
standard table usage, and these impact indexing strategies and bulk data
performance. You can expect to do some relearning regarding performance
tuning and indexing (typically you create a lot more indexes in MySQL than
in PostgreSQL. Note this typically makes bulk inserts a bit faster on
PostgreSQL).

2: MySQL and PostgreSQL have completely different assumptions about what
you are doing with your data. MySQL assumes it's a single app db.
PostgreSQL assumes it is a multi-app db. This explains a lot of sql_mode
weirdness in MySQL and why the PostgreSQL team will never go down that
path. This means also typically you have more mature tools for
manipulating data inside the db (via views, functions, and the like) than
you do in MySQL, but MySQL has some additional flexibility if you are just
porting to it from another db.

If it were me I would definitely migrate. It sounds like these are
internal line of business db's, and they may come from different MS Access
apps. For this reason, I think you will have more options regarding
centralizing your code and better managing your front-end access (I assume
through MS Access?) in PostgreSQL than you would in MySQL. However the
choice is yours.

Best Wishes,
Chris Travers

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Johannes Lochmann 2012-12-11 10:43:35 Re: large database
Previous Message Gavin Flower 2012-12-11 10:33:50 Re: large database