Skip site navigation (1) Skip section navigation (2)

Modifying COPY TO

From: "Dave Held" <dave(dot)held(at)arrayservicesgrp(dot)com>
To: <pgsql-hackers(at)postgresql(dot)org>
Subject: Modifying COPY TO
Date: 2005-02-25 22:19:09
Message-ID: 49E94D0CFCD4DB43AFBA928DDD20C8F902618457@asg002.asg.local (view raw, whole thread or download thread mbox)
Lists: pgsql-hackers
I am interested in hacking COPY TO such that one can specify that
rows are copied in a certain index order.  I got as far as 
src/backend/commands/copy.c:CopyTo(), and it looks like I would need
to modify the call to heap_beginscan() so that it uses a key.  However,
I couldn't figure out how to provide one, or if I'm even looking at the
right area.  Ideally, this behavior would be specified with a flag,
perhaps: "WITH INDEX <index_name>" or "WITH PRIMARY KEY"
or something similar.

The motivation for this change is as follows.  I have a fairly large
database (10 million+ records) that mirrors the data in a proprietary
system.  The only access to that data is through exported flat files.
Currently, those flat files are copied directly into a staging area in the 
db via a COPY FROM, the actual tables are truncated, and the
staging data is inserted into the live tables.  Since the data is read-only,
it doesn't matter that it is recreated every day.  However, as you
can imagine, the import process takes quite a while (several hours).
Also, rebuilding the db from scratch every day loses any statistical
information gathered from the execution of queries during the day.

A possibility that I would like to pursue is to keep the staging data
from the previous day, do a COPY TO, import the new data into
another staging table with a COPY FROM, then export the fresh
data with another COPY TO.  Then, I can write a fast C/C++
program to do a line-by-line comparison of each record, isolating
the ones that have changed from the previous day.  I can then
emit those records in a change file that should be relatively small
and easy to update.  Of course, this scheme can only work if
COPY TO emits the records in a reliable order.

Any assistance on this project would be greatly appreciated.  The
best I can see, I'm stuck on line 1053 from copy.c:

    scandesc = heap_beginscan(rel, mySnapshot, 0, NULL);

I suspect that I want it to look like this:

    scandesc = heap_beginscan(rel, mySnapshot, 1, key);

where 'key' is an appropriately constructed ScanKey.  It looks
like I want to call ScanKeyEntryInitialize(), but I'm not sure what
parameters I need to pass to it to get an index or the primary
key.  I mostly need help building the ScanKey object.  I think I 
can figure out how to hack the custom option, etc.  I should 
mention that I am using the 7.4.7 codebase on Linux 2.4.

David B. Held
Software Engineer/Array Services Group
200 14th Ave. East,  Sartell, MN 56377
320.534.3637 320.253.7800 800.752.8129


pgsql-hackers by date

Next:From: Tom LaneDate: 2005-02-25 22:56:50
Subject: Re: Modifying COPY TO
Previous:From: Bruce MomjianDate: 2005-02-25 22:09:20
Subject: Re: Development schedule

Privacy Policy | About PostgreSQL
Copyright © 1996-2017 The PostgreSQL Global Development Group