Re: Timestamp input + copy

From: "Kevin Bartz" <kbartz(at)loyaltymatrix(dot)com>
To: "'Chris Smith'" <cdsmith(at)twu(dot)net>, <pgsql-general(at)postgresql(dot)org>
Subject: Re: Timestamp input + copy
Date: 2004-07-28 23:06:56
Message-ID: 20040728231235.A2F2F40084@omta12.mta.everyone.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Chris! Thanks so much for your reply. Now that I think about it, I guess
you're right. I would rather know about what's going on in my data set than
simply be oblivious to it, even though SQL Server happened to guess
correctly in this case. Come to think of it, it wasn't until I tried to use
Postgres that I even knew my dates were formatted like that! Anyway, I
retract my prior statement.

Regarding your suggestion: is there any way I can make it go a little
faster? I used sed extensively for dates back in my MySQL days (MySQL is
VERY finicky about its datetime input), but I have upwards of 10 GB of data
like this, with the ill-formatted example I gave you on most of the lines.
At the rate it's going now, sed's going to need more than an hour to strip
away all the PMs and AMs.

Is there any way I can tell Postgres about the format to expect for date
strings, in the same manner I would tell the function to_timestamp (i.e.,
'YYYY-MM-DD HH24:MI:SS')?

Alternatively, I could load the bad column as a string and then tell
Postgres to recast it (using the "using" clause) with to_timestamp. The
syntax listed in the documentation, however,

alter table bonusticket
alter submit_date type timestamp using to_timestamp(substring(submit_date,
1, 19), 'YYYY-MM-DD HH24:MI:SS');

throws a syntax error, it explains, at the word "type." I have the latest
version, 7.4.3 on a 64-bit SuSE box. Any suggestions?

Kevin

-----Original Message-----
From: Chris Smith [mailto:cdsmith(at)twu(dot)net]
Sent: Wednesday, July 28, 2004 3:51 PM
To: Kevin Bartz
Subject: Re: [GENERAL] Timestamp input + copy

Kevin Bartz wrote:
> I have a flat file with a column with dates formatted like this:
>
> 2004-04-15 18:04:26 PM
>
> It's a bit strange, I know, but I didn't create the file. My idea of
> Postgres's proper behavior would be to load this date as a military
> time (and ignore the "PM"). MS SQL Server behaves in this way.

I couldn't disagree more that it would be correct behavior to ignore the PM
and treat it as 24-hour time. It's one of the most important features of a
database that when you give is bad data, it responds with an error message
rather than trying to guess at what you mean. Why are you using a database,
if not to ensure that you can trust your data; and how can you trust data
that
comes from an ambiguous source?

I'm sorry to hear that SQL Server accepts this input without complaint.
It's
caused me to lose confidence in that product.

> What can I do about
> this? Can I possibly specify a time format (similar to the
> 'YYYY-MM-DD HH24:MI:SS' I might pass to to_timestamp) at load time?

If you know that the time is in 24-hour form and want to ignore the AM or PM
specifier, then you can certainly run it by a processor written in pretty
much
any programming language that will fix it. In UNIX sed, it looks like this
(all on one line):

cat data.txt | sed '/s/([0-9]{2,4}\-[0-9]{1,2}\-[0-9]{1,2}
[0-9]{1,2}\:[0-9]{1,2}:[0-9]{1,2}) ((AM|PM))?/\1/g' > data.txt.fixed

--
www.designacourse.com
The Easiest Way to Train Anyone... Anywhere.

Chris Smith - Lead Software Developer/Technical Trainer
MindIQ Corporation

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Steve Crawford 2004-07-28 23:15:42 Re: Timestamp input + copy
Previous Message Chris Smith 2004-07-28 23:00:57 Re: Timestamp input + copy