Skip site navigation (1) Skip section navigation (2)

Re: Export tab delimited from mysql to postgres.

From: Pierre-Frédéric Caillaud <lists(at)boutiquenumerique(dot)com>
To: pgsql-sql(at)postgresql(dot)org
Subject: Re: Export tab delimited from mysql to postgres.
Date: 2004-10-12 07:20:21
Message-ID: opsfqxb7stcq72hf@musicbox (view raw or flat)
Thread:
Lists: pgsql-sql
A tested example in Python :
Basically it counts the \t and accumulates the lines until it has enough  
and then prints the line.
Note : as an exercise you could add a test so that there are exactly  
(columns-1) delimiters and not >=(columns-1).

def grouplines( in_stream, columns, delimiter ):
	num_delimiters = columns - 1
	accum = ''
	for line in in_stream:
		accum += line
		if accum.count( delimiter ) >= num_delimiters:
			print accum.replace( "\n", "\\n" )
			accum = ''
			
	if accum:
		print "Last line unterminated."

grouplines( open( 'data.in' ), 3, "\t" ):


Input data (I added a column over your example):
1       What a day!     A
2       What a week it has
been!   B
3       What the!       C

Output :
1       What a day!     A\n
2       What a week it has\nbeen!       B\n
3       What the!       C

Have fun with your copy !

On Tue, 12 Oct 2004 15:33:46 +1000, Theo Galanakis  
<Theo(dot)Galanakis(at)lonelyplanet(dot)com(dot)au> wrote:

>
> Thanks for all your comments,
>
> 	I have beent trying the insert within a transaction block, however
> it does not seem to reduce the time it takes to process each records.  
> Mind
> you there are 80 column and the insert statement explicitly defines the
> column to insert into.
>
> 	I need any tip I can get help me transform the text file into a
> format postgres copy will successfully read.
>
>
> Here is sample of the current format of a mysql tab delimited dump..
>
> columnA	columnB
>
> 1		What a day!
> 2		What a week it has
> been!
> 3		What the!
>
> As you can see row 2 has a value that holds a CR value which ends up
> wrapping around onto the third line. Postgres copy command does not like
> this and mysql is unable to replace the value with another type of
> delimiter, like a \r.
>
> So I gather I have to some how manually replace the carriage return with
> something postgres understand \r...
>
>
> columnA	columnB
>
> 1		What a day!
> 2		What a week it has \r been!
> 3		What the!
>
> How do I do this without getting a text file that looks like this
>
> 1	What a day! \r\n2		What a week it has \r been!\r\n3
> What the!\r\n
>
> Any help would be appreciated.
>
> Theo
>
> -----Original Message-----
> From: Christopher Browne [mailto:cbbrowne(at)acm(dot)org]
>
> Sent: Tuesday, 12 October 2004 10:46 AM
> To: pgsql-sql(at)postgresql(dot)org
> Subject: Re: [SQL] Export tab delimited from mysql to postgres.
>
>
> Quoth Theo(dot)Galanakis(at)lonelyplanet(dot)com(dot)au (Theo Galanakis):
>>         Could you provide a example of how to do this?
>>
>>         I actually ended up exporting the data as Insert statements,
>> which strips out cf/lf within varchars. However it takes an eternity
>
>> to import 200,000 records... 24 hours infact???? Is this normal?
>
> I expect that this results from each INSERT being a separate transaction.
>
> If you put a BEGIN at the start and a COMMIT at the end, you'd doubtless  
> see
> an ENORMOUS improvement.
>
> That's not even the _big_ improvement, either.  The _big_ improvement  
> would
> involve reformatting the data so that you could use the COPY statement,
> which is _way_ faster than a bunch of INSERTs.  Take a look at the
> documentation to see the formatting that is needed:
>
> http://techdocs.postgresql.org/techdocs/usingcopy.php
> http://www.faqs.org/docs/ppbook/x5504.htm
> http://www.postgresql.org/docs/7.4/static/sql-copy.html
> --
>
> output = ("cbbrowne" "@" "ntlug.org")
> http://www3.sympatico.ca/cbbrowne/lsf.html
> Question: How many surrealists does it take to change a light bulb?
>
> Answer: Two, one to hold the giraffe, and the other to fill the bathtub
>         with brightly colored machine tools.
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: you can get off all lists at once with the unregister command
>     (send "unregister YourEmailAddressHere" to majordomo(at)postgresql(dot)org)
>
>
> ______________________________________________________________________
> This email, including attachments, is intended only for the addressee
> and may be confidential, privileged and subject to copyright.  If you
> have received this email in error, please advise the sender and delete
> it.  If you are not the intended recipient of this email, you must not
> use, copy or disclose its content to anyone.  You must not copy or
>
> communicate to others content that is confidential or subject to
>
> copyright, unless you have the consent of the content owner.



In response to

pgsql-sql by date

Next:From: Fang GenjieDate: 2004-10-12 08:01:05
Subject: the problem of createlang!
Previous:From: Theo GalanakisDate: 2004-10-12 05:33:46
Subject: Re: Export tab delimited from mysql to postgres.

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group