Re: Importing normalised data by SQL script in remote DB

From: "Sean Davis" <sdavis2(at)mail(dot)nih(dot)gov>
To: Andreas <maps(dot)on(at)gmx(dot)net>
Cc: pgsql-novice(at)postgresql(dot)org
Subject: Re: Importing normalised data by SQL script in remote DB
Date: 2008-04-16 11:04:24
Message-ID: 264855a00804160404l1f3f6322ofd3cd5757c3b361a@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-novice

On Tue, Apr 15, 2008 at 11:01 PM, Andreas <maps(dot)on(at)gmx(dot)net> wrote:
> Sean Davis schrieb:
>
>
>
> > On Tue, Apr 15, 2008 at 9:54 PM, Andreas <maps(dot)on(at)gmx(dot)net> wrote:
> >
> >
> > > Hi,
> > > I've got to import data into a remote database.
> > > I get some stuff usually as excel-files that doesn't fit the
> db-structure
> > > in respect of normalisation so I import it into Access, brush up the
> data
> > > and push the columns from there in the right tables via ODBC.
> > >
> > > For bigger imports (~5000 lines in Excel that get spread over 4-6
> tables in
> > > my db) I fetch the remote DB, do the import locally and transfer a dump
> back
> > > to the remote site when noone uses the server.
> > > This is getting unwieldy as the db grows and the connection is not
> really
> > > that fast.
> > >
> > > I can access the server by ssh so it might be way faster to run a
> prepared
> > > SQL file that consists just of the new data against the DB on the
> console.
> > >
> > > Could I build a SQL script that adds the new stuff and creates foreign
> keys
> > > on the fly without the need to know the new IDs before?
> > >
> > > Lets suppose I got a couple of foreign-linked tables:
> > > company (company_id serial primary key, name, ...)
> > > person (person_id serial primary key, company_fk, name, ...)
> > > contact (contact_id serial primary key, person_fk, contact_date,
> > > contact_notes, ...)
> > >
> > > The SQL script needed to:
> > > 1) INSERT INTO company ...
> > > 2) look up the last created company_id
> > > 3) INSERT INTO person and use the company_id to set company_fk
> > > 4) look up the last created person_id
> > > 5) INSERT INTO contact and use the person_id to set person_fk
> > > Probaply create some other persons with their contacts.
> > > Then another company and so on.
> > >
> > > I do this right now with vba in Access but this is painfully slow via
> > > InterNet as it constantly has to ask the server for the last ids and the
> > > Net's latency spoils all the fun.
> > >
> > >
> >
> > Why not avoid Access altogether. Load the data into postgres "as-is"
> > into a set of loader tables. Then, you can use SQL to do the data
> > munging and inserts, including the foreign key relationships. You can
> > use \copy in psql to load the data, so there is no need to have access
> > to the server.
> >
> > Sean
> >
> >
> >
> Thanks for your suggestion.
> As said usually I get the data as a xls with wrong column names and often
> enough wrong column types, missing infos or cluttered up in creative ways,
> too.
> How could I easily import a certain tab within a xls into a temporary
> pg-table?

Well, you would need to convert to tab-delimited text, an easy
operation. Then, create a table based on the structure of the xls
file; you can use varchar columns here if you like and then cast to
the correct datatypes later.

> Then often the data is a dump of JOINs so it is not normalised.
> I can't help it. I just get the stuff dumped over me as our customers are
> able or willing to provide.
> Most of the time it's like this:
> ...
> company_1, ..., person_1, ...
> company_1, ..., person_2, ...
> company_1, ..., person_3, ...
> company_2, ..., person_4, ...
> company_3, ..., person_5, ...
> company_3, ..., person_6, ...
> ...
> I've got to catch the repeating companies to avoid doubles.
> Until now I try to find some columns that represent a key for the company
> part of the data.
> Then sort by this key columns, walk over the table and generate just a
> company whenever a part of the key changes.

You could do all that, but that would be pretty suboptimal. How about
a table like this:

create table company_table (
id serial primary key,
company_name varchar unique
);

insert into table company_table(name) select distinct(company_name)
from tmp_table where distinct(company_name) not in (select
company_name from company_table);

This is UNTESTED, but you can hopefully get the idea. You can use SQL
to take care of finding unique values (distinct), generating IDs by
looking that the company name is not in the company table already
(again using SQL), using casts and string functions, etc.

Hope that is enough direction to get you started.
Sean

> AFAIK I need to insert the person records right after I created their
> company because if I first create all companies and then the people I
> wouldn't know the company-id that belongs to a person.
>
> I'd love to learn how to do this more cleverly w/o Access.
>
>
> regards
> Andreas
>

In response to

Browse pgsql-novice by date

  From Date Subject
Next Message Guido Barosio 2008-04-16 13:55:36 Re: Inserting a record returned from a function
Previous Message p9e883002 2008-04-16 07:46:54 Inserting a record returned from a function