Re: Importing normalised data by SQL script in remote DB

From: Andreas <maps(dot)on(at)gmx(dot)net>
To: Sean Davis <sdavis2(at)mail(dot)nih(dot)gov>
Cc: pgsql-novice(at)postgresql(dot)org
Subject: Re: Importing normalised data by SQL script in remote DB
Date: 2008-04-16 03:01:42
Message-ID: 48056C16.3040205@gmx.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-novice

Sean Davis schrieb:
> On Tue, Apr 15, 2008 at 9:54 PM, Andreas <maps(dot)on(at)gmx(dot)net> wrote:
>
>> Hi,
>> I've got to import data into a remote database.
>> I get some stuff usually as excel-files that doesn't fit the db-structure
>> in respect of normalisation so I import it into Access, brush up the data
>> and push the columns from there in the right tables via ODBC.
>>
>> For bigger imports (~5000 lines in Excel that get spread over 4-6 tables in
>> my db) I fetch the remote DB, do the import locally and transfer a dump back
>> to the remote site when noone uses the server.
>> This is getting unwieldy as the db grows and the connection is not really
>> that fast.
>>
>> I can access the server by ssh so it might be way faster to run a prepared
>> SQL file that consists just of the new data against the DB on the console.
>>
>> Could I build a SQL script that adds the new stuff and creates foreign keys
>> on the fly without the need to know the new IDs before?
>>
>> Lets suppose I got a couple of foreign-linked tables:
>> company (company_id serial primary key, name, ...)
>> person (person_id serial primary key, company_fk, name, ...)
>> contact (contact_id serial primary key, person_fk, contact_date,
>> contact_notes, ...)
>>
>> The SQL script needed to:
>> 1) INSERT INTO company ...
>> 2) look up the last created company_id
>> 3) INSERT INTO person and use the company_id to set company_fk
>> 4) look up the last created person_id
>> 5) INSERT INTO contact and use the person_id to set person_fk
>> Probaply create some other persons with their contacts.
>> Then another company and so on.
>>
>> I do this right now with vba in Access but this is painfully slow via
>> InterNet as it constantly has to ask the server for the last ids and the
>> Net's latency spoils all the fun.
>>
>
> Why not avoid Access altogether. Load the data into postgres "as-is"
> into a set of loader tables. Then, you can use SQL to do the data
> munging and inserts, including the foreign key relationships. You can
> use \copy in psql to load the data, so there is no need to have access
> to the server.
>
> Sean
>
>
Thanks for your suggestion.
As said usually I get the data as a xls with wrong column names and
often enough wrong column types, missing infos or cluttered up in
creative ways, too.
How could I easily import a certain tab within a xls into a temporary
pg-table?

Then often the data is a dump of JOINs so it is not normalised.
I can't help it. I just get the stuff dumped over me as our customers
are able or willing to provide.
Most of the time it's like this:
...
company_1, ..., person_1, ...
company_1, ..., person_2, ...
company_1, ..., person_3, ...
company_2, ..., person_4, ...
company_3, ..., person_5, ...
company_3, ..., person_6, ...
...
I've got to catch the repeating companies to avoid doubles.
Until now I try to find some columns that represent a key for the
company part of the data.
Then sort by this key columns, walk over the table and generate just a
company whenever a part of the key changes.

AFAIK I need to insert the person records right after I created their
company because if I first create all companies and then the people I
wouldn't know the company-id that belongs to a person.

I'd love to learn how to do this more cleverly w/o Access.

regards
Andreas

In response to

Responses

Browse pgsql-novice by date

  From Date Subject
Next Message p9e883002 2008-04-16 07:46:54 Inserting a record returned from a function
Previous Message Sean Davis 2008-04-16 02:05:51 Re: Importing normalised data by SQL script in remote DB