Re: Date format for bulk copy

From: David Rysdam <drysdam(at)ll(dot)mit(dot)edu>
To:
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Date format for bulk copy
Date: 2004-10-13 17:43:00
Message-ID: 416D6924.8090509@ll.mit.edu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Greg Stark wrote:

>David Rysdam <drysdam(at)ll(dot)mit(dot)edu> writes:
>
>
>
>>In my brute force port, I just bulk copied the date
>>fields into temporary tables and then did a to_timestamp(field, 'Mon DD YYYY
>>HH:MI:SS:MSAM').
>>
>>
>
>
>
>>Again, I created a temporary table and did a decode(field, 'hex') to the
>>real table.
>>
>>
>
>This is the standard approach. You're rather lucky these are the only
>data representation changes you've had to do so far. I fear you'll run into
>more and more complex changes over time and trying to avoid the temporary
>table will get harder and harder.
>
>
>
No, I think I'm OK there. These are programmatically-generated values
and I've already been through them all once. Just the millisecond issue
and the hex binary issue AFAIK.

>If it were me I would consider processing the files in perl. It should be
>pretty easy to do both of these modifications very quickly.
>
>
>
Very quick and easy to do one time. A little trickier to handle in an
elegant, maintainable way for the dozens of data reloads I do every
month for GBs of data onto two different server types.

>If you really want to go with a custom C code then you might be able to just
>grab the byteain/byteaout functions from src/backend/util/adt/varlena into a
>separate module and create new functions with modified names. Load it with
>CREATE FUNCTION byteain ... AS 'my_bytea_funcs.so' 'my_byteain';
>
>Or maybe create the function as my_byteain in postgres and then update the
>catalog entries somehow. I'm not sure how to do that but it shouldn't be too
>hard. And it might make it easier to do the substitution for the data load and
>then undo the change afterwards.
>
>
>
Why not create a type and then define the load function to be the
equivalent of "decode('hex')"?

>Doing the same for timmestamp is a bit trickier but you could copy
>ParseDateTime from datetime.c as a static function for your module.
>
>Be careful though, test this out thoroughly on a test database. I'm not sure
>of all the impacts of altering the in/out functions for data types. I expect
>it would break pg_dump, for example. And I would worry about the statistics
>tables too.
>
>
>
This is kind of a hybrid of my suggestions and the problems are a hybrid
as well. :)

1) Just change the timestamp type so that it allows a ':' delimiter for
milliseconds. Potential problems: Other parts of the code won't expect
it. People don't want that.

2) Create a new type. Potential problem: Things like date ranges
probably wouldn't work anymore, since the server wouldn't know it's a
date now.

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Robby Russell 2004-10-13 17:59:42 Re: Level of replication support?
Previous Message David Rysdam 2004-10-13 17:32:01 Re: Date format for bulk copy