Re: Modifying COPY TO

From: Andrew - Supernews <andrew+nonews(at)supernews(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Modifying COPY TO
Date: 2005-02-26 04:07:41
Message-ID: slrnd1vtgd.2dej.andrew+nonews@trinity.supernews.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2005-02-25, "Dave Held" <dave(dot)held(at)arrayservicesgrp(dot)com> wrote:
> A possibility that I would like to pursue is to keep the staging data
> from the previous day, do a COPY TO, import the new data into
> another staging table with a COPY FROM, then export the fresh
> data with another COPY TO. Then, I can write a fast C/C++
> program to do a line-by-line comparison of each record, isolating
> the ones that have changed from the previous day. I can then
> emit those records in a change file that should be relatively small
> and easy to update.

I have an application that does something like this, but rather than use an
external program, I do the comparison in the database itself:

- import data from external system into a temporary table
- compare the temporary table against the live data (a full outer join
is a convenient way of doing this - I create an index on the temp table
first)
- perform insert/update/delete for each record that was added, changed
or removed

In my case the compare/update is in a pl/pgsql function. My data is only
2-3 million rows, a bit smaller than yours, but I have to update hourly,
not daily, and spend no more than 5-10 minutes on each update (currently
I can do it in 5: 2 to load the data, 3 to do the compare/update).

--
Andrew, Supernews
http://www.supernews.com - individual and corporate NNTP services

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2005-02-26 04:46:39 Re: Modifying COPY TO
Previous Message Jim C. Nasby 2005-02-26 02:35:02 Re: idea for concurrent seqscans