Re: Export tab delimited from mysql to postgres.

From: Pierre-Frédéric Caillaud <lists(at)boutiquenumerique(dot)com>
To: pgsql-sql(at)postgresql(dot)org
Subject: Re: Export tab delimited from mysql to postgres.
Date: 2004-10-12 07:20:21
Message-ID: opsfqxb7stcq72hf@musicbox
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-sql


A tested example in Python :
Basically it counts the \t and accumulates the lines until it has enough
and then prints the line.
Note : as an exercise you could add a test so that there are exactly
(columns-1) delimiters and not >=(columns-1).

def grouplines( in_stream, columns, delimiter ):
num_delimiters = columns - 1
accum = ''
for line in in_stream:
accum += line
if accum.count( delimiter ) >= num_delimiters:
print accum.replace( "\n", "\\n" )
accum = ''

if accum:
print "Last line unterminated."

grouplines( open( 'data.in' ), 3, "\t" ):

Input data (I added a column over your example):
1 What a day! A
2 What a week it has
been! B
3 What the! C

Output :
1 What a day! A\n
2 What a week it has\nbeen! B\n
3 What the! C

Have fun with your copy !

On Tue, 12 Oct 2004 15:33:46 +1000, Theo Galanakis
<Theo(dot)Galanakis(at)lonelyplanet(dot)com(dot)au> wrote:

>
> Thanks for all your comments,
>
> I have beent trying the insert within a transaction block, however
> it does not seem to reduce the time it takes to process each records.
> Mind
> you there are 80 column and the insert statement explicitly defines the
> column to insert into.
>
> I need any tip I can get help me transform the text file into a
> format postgres copy will successfully read.
>
>
> Here is sample of the current format of a mysql tab delimited dump..
>
> columnA columnB
>
> 1 What a day!
> 2 What a week it has
> been!
> 3 What the!
>
> As you can see row 2 has a value that holds a CR value which ends up
> wrapping around onto the third line. Postgres copy command does not like
> this and mysql is unable to replace the value with another type of
> delimiter, like a \r.
>
> So I gather I have to some how manually replace the carriage return with
> something postgres understand \r...
>
>
> columnA columnB
>
> 1 What a day!
> 2 What a week it has \r been!
> 3 What the!
>
> How do I do this without getting a text file that looks like this
>
> 1 What a day! \r\n2 What a week it has \r been!\r\n3
> What the!\r\n
>
> Any help would be appreciated.
>
> Theo
>
> -----Original Message-----
> From: Christopher Browne [mailto:cbbrowne(at)acm(dot)org]
>
> Sent: Tuesday, 12 October 2004 10:46 AM
> To: pgsql-sql(at)postgresql(dot)org
> Subject: Re: [SQL] Export tab delimited from mysql to postgres.
>
>
> Quoth Theo(dot)Galanakis(at)lonelyplanet(dot)com(dot)au (Theo Galanakis):
>>         Could you provide a example of how to do this?
>>
>>         I actually ended up exporting the data as Insert statements,
>> which strips out cf/lf within varchars. However it takes an eternity
>
>> to import 200,000 records... 24 hours infact???? Is this normal?
>
> I expect that this results from each INSERT being a separate transaction.
>
> If you put a BEGIN at the start and a COMMIT at the end, you'd doubtless
> see
> an ENORMOUS improvement.
>
> That's not even the _big_ improvement, either. The _big_ improvement
> would
> involve reformatting the data so that you could use the COPY statement,
> which is _way_ faster than a bunch of INSERTs. Take a look at the
> documentation to see the formatting that is needed:
>
> http://techdocs.postgresql.org/techdocs/usingcopy.php
> http://www.faqs.org/docs/ppbook/x5504.htm
> http://www.postgresql.org/docs/7.4/static/sql-copy.html
> --
>
> output = ("cbbrowne" "@" "ntlug.org")
> http://www3.sympatico.ca/cbbrowne/lsf.html
> Question: How many surrealists does it take to change a light bulb?
>
> Answer: Two, one to hold the giraffe, and the other to fill the bathtub
> with brightly colored machine tools.
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: you can get off all lists at once with the unregister command
> (send "unregister YourEmailAddressHere" to majordomo(at)postgresql(dot)org)
>
>
> ______________________________________________________________________
> This email, including attachments, is intended only for the addressee
> and may be confidential, privileged and subject to copyright. If you
> have received this email in error, please advise the sender and delete
> it. If you are not the intended recipient of this email, you must not
> use, copy or disclose its content to anyone. You must not copy or
>
> communicate to others content that is confidential or subject to
>
> copyright, unless you have the consent of the content owner.

In response to

Browse pgsql-sql by date

  From Date Subject
Next Message Fang Genjie 2004-10-12 08:01:05 the problem of createlang!
Previous Message Theo Galanakis 2004-10-12 05:33:46 Re: Export tab delimited from mysql to postgres.