Re: "Ungroup" data for import into PostgreSQL

From: Jim Nasby <Jim(dot)Nasby(at)BlueTreble(dot)com>
To: George Weaver <gweaver(at)shaw(dot)ca>, pgsql-general <pgsql-general(at)postgresql(dot)org>
Subject: Re: "Ungroup" data for import into PostgreSQL
Date: 2015-01-16 00:56:00
Message-ID: 54B861A0.9060709@BlueTreble.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On 1/15/15 9:43 AM, George Weaver wrote:
> Hi List,
>
> I need to import data from a large Excel spreadsheet into a PostgreSQL table. I have a program that uses ODBC to connect to Excel and extract data using SQL queries. The program then inserts the data into a PostgreSQL table.
>
> The challenge with this particular spreadsheet is that it is arrayed thus:
>
> Model No 1 Product Code 15
> Serial No 1 No on Hand
> Serial No 2 No on Hand
> Serial No 3 No on Hand
> Model No 4 Product Code 9
> Serial No 12 No on Hand
> Model No 5 Product Code 27
> Serial No 6 No on Hand
> Serial No 14 No on Hand
>
> etc.
>
> I need the data in PostgreSQL arrayed thus
>
> Model No 1 Product Code 15 Serial No 1 No on Hand
> Model No 1 Product Code 15 Serial No 2 No on Hand
> Model No 1 Product Code 15 Serial No 3 No on Hand
> Model No 4 Product Code 9 Serial No 12 No on Hand
> Model No 5 Product Code 27 Serial No 6 No on Hand
> Model No 5 Product Code 27 Serial No 14 No on Hand
>
> I can import the data procedurely using plpgsql to match the individual rows to the master for each row (import the raw data into a buffer table in PostgreSQL and then looping through the rows in the buffer table and checking to see when the Model No changes).

Note that if you're doing that you better be putting the rownumber from excel into the table... result sets are NOT guaranteed to be in insert order!

> I'm wondering if there is a more elegant way to do this using straight sql from Excel?

Well, that's really an excel question, not a Postgres question...

If you load the whole spreadsheet into a single table and have a way to differentiate between the different rows then you might be able to do something with CTE's to relate a serial number to the product code. That might be faster than plpgsql.

You might also be able to do something creative with formulas in excel to copy the product code data to the serial # rows. You could then import the whole thing and re-normalize it.

There's probably some stuff you could do with VBA too. If you care about performance you don't want to execute SQL statements for each spreadsheet row.
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Adrian Klaver 2015-01-16 01:02:06 Re: "Ungroup" data for import into PostgreSQL
Previous Message Jim Nasby 2015-01-16 00:50:03 Re: Simple Atomic Relationship Insert