Re: Recommended Protocol: Adding Rows to Table

From: Scott Mead <scottm(at)openscg(dot)com>
To: Rich Shepard <rshepard(at)appl-ecosys(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Recommended Protocol: Adding Rows to Table
Date: 2011-11-25 20:00:50
Message-ID: CAKq0gv+eCtxN1jP2TDCYj+MDp-OTkdGMkdQiL7pavWJeDvh2Fw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Fri, Nov 25, 2011 at 11:05 AM, Rich Shepard <rshepard(at)appl-ecosys(dot)com>wrote:

> The data originated in a spreadsheet and, based on my experience, contains
> duplicate records. After reformatting there are 143,260 rows to insert in
> the table. The approach I tried seems to have problems (explained below)
> and
> I would like to learn the proper way to insert rows in either an empty
> table
> or one with existing rows since I'll need to do this procedure for my
> projects.
>
> The table was created with the primary key and I used INSERT INTO ... to
> load the data. Many duplicate records, so I split the file into smaller
> ones
> and re-ran the command to load them. I'd then remove the reported duplicate
> rows from the source (text) file. Between runs, I'd issue the DROP TABLE
> command within psql and check it was empty using 'select count(*) from
> waterchem;'.
>
> It appeared that I removed duplicates from the first couple of smaller
> files so I combined them into one file named ok.sql. But, when I tested the
> combined file it, too, reported many duplicate records. Something wrong
> here.
>
> Perhaps a better approach is to put the CREATE TABLE command above the
> INSERT INTO section of the file (without specifying a PK), load that using
> the command 'psql -d <database> -f waterchem.sql', then add the PK and
> remove duplicates as postgres presents them. Or, perhaps there is a much
> more efficient way to accomplish this task.
>
> Rather than my flailing around and spending a lot of time failing to load
> all non-duplicate rows into the table I'd like to learn the proper way to
> accomplish this task. Suggestions, recommendations, and your experiences
> are
> requested.
>

Why don't you first load the data into a table (no primary key), then use
SQL to find your dups?

once loaded:
SELECT <primary_key_column>, count(1) from <table> group by 1 having
count(1) > 1;

At least then, you'll really know what you're in for. You can either
script a DELETE or... whatever you want to do, once clean, you can add the
PK.

--
Scott

>
> TIA,
>
> Rich
>
> --
> Sent via pgsql-general mailing list (pgsql-general(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/**mailpref/pgsql-general<http://www.postgresql.org/mailpref/pgsql-general>
>

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Alpha Beta 2011-11-25 20:04:38 .dmp files in postgresql
Previous Message Adrian Klaver 2011-11-25 19:17:15 Re: Recommended Protocol: Adding Rows to Table