Skip site navigation (1) Skip section navigation (2)

Re: Best practice to load a huge table from ORACLE to PG

From: Greg Smith <gsmith(at)gregsmith(dot)com>
To: Adonias Malosso <malosso(at)gmail(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Best practice to load a huge table from ORACLE to PG
Date: 2008-04-27 13:01:46
Message-ID: Pine.GSO.4.64.0804270833430.11601@westnet.com (view raw or flat)
Thread:
Lists: pgsql-performance
On Sat, 26 Apr 2008, Adonias Malosso wrote:

> The current approach is to dump the data in CSV and than COPY it to 
> Postgresql.

You would have to comment on what you don't like about what you're doing 
now, what parts need to be improved for your priorities, to get a properly 
targeted answer here.

> Id like to know whats the best practice to LOAD a 70 milion rows, 101 
> columns table from ORACLE to PGSQL.

There is no one best practice.  There's a wide variety of techniques on 
both the Oracle and PostgreSQL side in this area that might be used 
depending on what trade-offs are important to you.

For example, if the goal was to accelerate a dump of a single table to run 
as fast as possible because you need , you'd want to look into techniques 
that dumped that table with multiple sessions going at once, each handling 
a section of that table.  Typically you'd use one session per CPU on the 
server, and you'd use something with a much more direct path into the data 
than SQL*PLUS.  Then on the PostgreSQL side, you could run multiple COPY 
sessions importing at once to read this data all back in, because COPY 
will bottleneck at the CPU level before the disks will if you've got 
reasonable storage hardware.

There's a list of utilities in this are at 
http://www.orafaq.com/wiki/SQL*Loader_FAQ#Is_there_a_SQL.2AUnloader_to_download_data_to_a_flat_file.3F 
you might look for inspiration in that area, I know the WisdomForce 
FastReader handles simultaneous multi-section dumps via a very direct path 
to the data.

...but that's just one example based on one set of priorities, and it will 
be expensive in terms of dollars and complexity.

As another example of something that changes things considerably, if 
there's any data with errors that will cause COPY to abort you might 
consider a different approach on the PG side.

--
* Greg Smith gsmith(at)gregsmith(dot)com http://www.gregsmith.com Baltimore, MD

In response to

Responses

pgsql-performance by date

Next:From: Josh BerkusDate: 2008-04-27 17:54:00
Subject: Re: [pgsql-advocacy] Benchmarks WAS: Sun Talks about MySQL
Previous:From: Jonah H. HarrisDate: 2008-04-27 01:14:53
Subject: Re: Best practice to load a huge table from ORACLE to PG

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group