Re: [GENERAL] Desperately Seeking Regular Expression (fwd)

From: Thomas Good <tomg(at)q8(dot)nrnet(dot)org>
To: Herouth Maoz <herouth(at)oumail(dot)openu(dot)ac(dot)il>
Cc: PostgreSQL List <pgsql-general(at)postgresql(dot)org>
Subject: Re: [GENERAL] Desperately Seeking Regular Expression (fwd)
Date: 1999-04-29 12:31:32
Message-ID: Pine.SV4.3.91.990429082937.17671C-100000@q8.nrnet.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Thu, 29 Apr 1999, Herouth Maoz wrote:

> At 16:35 +0300 on 27/04/1999, Thomas Good wrote:

> > I've had success previously doing a port - but from FoxPro which
> > allows one to dump data delimited by tabs. Unfortunately, PROGRESS
> > dumps fields delimited by whitespace rather than tabs and I can find no
> > documentation on how to alter this behaviour.
> >
> > I read the recent post wherein someone used awk to change whitespace
> > to tabs:
> >
> > cat $input | awk '{ print $1"\t"$2"\t"$3"\t"$4"\t"$5"\t" \
> > $6"\t"$7"\t" }' > $input.out
> >
> > I am using this with good effect. However, I run into trouble as
> > inside my dump file(s) there are doublequoted character strings.
> > awk is changing the whitespace delimited words inside the char strs
> > into tab delimited words inside strings. Ouch.

> I have a feeling that you are missing additional points. For example, if
> you want to use the resulting text as input for COPY, strings should not be
> delimited within quotes. And possible tabs and newlines and backslashes
> within the file should be properly preceded with "\".

Hi Herouth! How goes it? I'm assuming you got Solaris and PG happy.
I'm wrestling with UnixWare now - just got the backend to compile, now
fighting with ecpg...(Bruce M is probably rather weary of me by now!)
Anyway -

Thanks for jumping into the breach...what I do is this:

0) use awk to create tabs where whitespace exists as a field separator.
1) use perl to tr tabs back to whitespace within double quoted strings.
2) use sed to change "" to \N (PROGRESS nulls are idiosyncratic/idiotic)
3) use sed to change ? (the PROGRESS unknown value) to \N
4) use sed to strip the remaining single quotes

And that is essentially it. The dump file loads correctly.
MANY thanks to Oliver E for sending me a perl script that
completes step two...(this is a very elegant solution).
Thanks also to Adriaan Joubert who also sent me some code.
(Merci!) Here is Olly's script (I call this from an awk
script then redirect the output - crude but effective):

#!/usr/bin/perl
$input = $ARGV[0];
open (DUMPFILE, "$input.out");
while (<DUMPFILE>) {
if ($_ =~ '"') {
@ln = split /"/, $_;
$i = 0;
foreach $elem (@ln) {
if ( $i % 2) {
print '"';
$elem =~ tr/\t/ /;
print $elem;
print '"';
} else {
print $elem;
}
$i++;
}
} else {
print $_;
}
}
close DUMPFILE;

It is a relief to have:
1) my dump files ready for loading into PG
2) the UnixWare PG binaries almost ready for primetime.

When this is finished we will have moved my entire shop (the Dept of
Psychiatry - we span two hospital campuses) from PROGRESS on UnixWare
and FoxPro on DOS to PG on Linux/FreeBSD. PG on UnixWare is an
interim step. Once the database is ported I am going to commence working
on losing UnixWare...

Nice talking to you - be well!
Tom
----
North Richmond Community Mental Health Center
---
Thomas Good tomg@ { admin | q8 } .nrnet.org
Phone: 718-354-5528
Fax: 718-354-5056
Powered By: Slackware 3.6 PostgreSQL 6.3.2
---
/* Die Wahrheit Ist Irgendwo Da Draussen... */

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Jim Mercer 1999-04-29 12:48:06 2 gig limitation?
Previous Message Thomas Good 1999-04-29 12:12:02 Re: [GENERAL] UnixWare chokes on ECPG