Re: Suggestions for a data-warehouse migration routine

From: John A Meinel <john(at)arbash-meinel(dot)com>
To: Richard Rowell <richard(at)bowmansystems(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org, Mark Rinaudo <mark(at)bowmansystems(dot)com>
Subject: Re: Suggestions for a data-warehouse migration routine
Date: 2005-04-27 17:23:56
Message-ID: 426FCAAC.4040000@arbash-meinel.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Richard Rowell wrote:
> I've ported enough of my companies database to Postgres to make
> warehousing on PG a real possibility. I thought I would toss my data
> migration architecture ideas out for the list to shoot apart..
>
> 1. Script on production server dumps the production database (MSSQL) to
> a set of delimited text files.
> 2. Script on production server moves files via FTP to a Postgres
> database server.
> 3. File Alteration Monitor trigger on PG server executes script when
> last file is transferred.
> 4. Script on PG server drops the target database (by issuing a "dropdb"
> command).
> 5. Script on PG server re-creates target database. (createdb command)
> 6. Script on PG server re-creates the tables.
> 7. Script on PG server issues COPY commands to import data.
> 8. Script on PG server indexes tables.
> 9. Script on PG server builds de-normalized reporting tables.
> 10. Script on PG server indexes the reporting tables.
> 11. Script on PG server creates needed reporting functions.
> 12. Vacuum analyze?
>
> My question revolves around the drop/create for the database. Is their
> significant downside to this approach? I'm taking this approach because
> it is simpler from a scripting point of view to simply start from
> scratch on each warehouse update. If I do not drop the database I would
> need to delete the contents of each table and drop all indexes prior to
> the COPY/data import. My assumption is all the table deletes and index
> drops would be more expensive then just droping/re-creating the entire
> database.

I believe you are correct. If you are going to completely wipe the
database, just drop it and re-create. Deleting is much slower than
dropping. (One of the uses of partitioning is so that you can just drop
one of the tables, rather than deleting the entries). Dropping the whole
db skips any Foreign Key checks, etc.

>
> Also, is the Vacuum analyze step needed on a freshly minted database
> where the indexes have all been newly created?
>
> Thanks in advance for all feedback.

ANALYZE is needed, since you haven't updated any of your statistics yet.
So the planner doesn't really know how many rows there are.

VACUUM probably isn't since everything should be pretty well aligned.

John
=:->

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Anjan Dave 2005-04-27 17:48:15 Why is this system swapping?
Previous Message John A Meinel 2005-04-27 17:20:37 Re: Final decision