Re: [OT?] ETL tools

From: "Roger Hand" <RHand(at)kailea(dot)com>
To: "Jose Gonzalez Gomez" <jgonzalez(dot)openinput(at)gmail(dot)com>, <pgsql-general(at)postgresql(dot)org>
Subject: Re: [OT?] ETL tools
Date: 2005-08-24 09:05:25
Message-ID: DB28E9B548192448A4E8C8A3C1B1E475611C93@sj1-exch-01.us.corp.kailea.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Jose Gonzalez wrote:
> The situation is a bit chaotic as they're using
> a lot of local Access databases, some databases hosted in an old
> version of Microsoft SQL Server and a lot of data in other non
> relational files (SPSS, Excel, ...). I was hoping to impose a bit of
> order and I started installing a current version of PostgreSQL to host
> all the databases they're using.
> ...
> Maybe I could try another approach?

Personally, I would write code (Java or whatever) to do the work. There will almost certainly be cases where you need to do special data massaging, or special rules for special cases, and that will be a lot easier to do when you are in complete control of what happens. I would be afraid that an ETL tool ...

1. Would have a tedious learning curve.
2. You would discover (after x hours) that it doesn't do something you absolutely need to be able to do

Then again, I haven't used any ETL tools (well, not for a long, long time), unless you count PGAdmin [http://www.pgadmin.org/]

The PGAdmin-II app had an excellent MS SQL Server -> Postgres data conversion plug-in. I used it many, many times with zero problems, with both SQL Server 7 and 2000. Unfortunately, the last I checked the current PGAdmin-III app doesn't seem to have or support this plugin, and the PGAdmin-II app doesn't work with Postgres 8 iirc. If, somehow, this converter was available again you could give it a shot, but I don't think it supported much more than straight table copy type stuff.

The problem with writing the code is that you'll need to do it from a platform that can access all the data sources. I've used Java for these types of tasks.

Postgres, of course, has a JDBC driver, so there's no problem there. MS SQL Server 2000 has a Microsoft JDBC driver, but I have used one that I bought that works with SQL Server 7, which did not come with a Microsoft JDBC driver. So if you're using a pre-2000 version of SQL Server you will need to hunt up a JDBC driver. (Actually, the ODBC-JDBC bridge exists ... that is not recommended for any kind of real world use, but maybe would work for a one-time pull.)

I've successfully accessed Excel data from Java using the free Java Excel API [http://www.andykhan.com/jexcelapi/index.html]

I've used the sun sun.jdbc.odbc.JdbcOdbcDriver driver to access MS Access files. Again, this uses the bridge, but for just reading data would probably prove adequate.

Good luck!

-Roger

> Jose

Browse pgsql-general by date

  From Date Subject
Next Message Karsten Hilbert 2005-08-24 09:40:13 Re: ctid access is slow
Previous Message jarek.nowotka@gmail.com 2005-08-24 08:53:04 Re: portuguese characters