COPY table_name (single_column) FROM 'unknown.txt' DELIMITER E'\n'

From: "Joel Jacobson" <joel(at)compiler(dot)org>
To: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: COPY table_name (single_column) FROM 'unknown.txt' DELIMITER E'\n'
Date: 2021-05-05 15:30:30
Message-ID: f901c82d-5144-4d3e-92a0-54207cf1d9a7@www.fastmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

From time to time, I need to deal with bizarrely delimited text files,
having to use tools such as awk/sed/perl to reformat the files
so that they can be copied into PostgreSQL.

If such files could be imported to a table with a single text column,
we could then use PostgreSQL's now really fast regex-engine
to do data cleaning/reformatting, instead of having to rely on external awk-ward tools.

Furthermore, sometimes you don't want to clean/reformat the data at all,
but simply import the text lines "as is" without modifications,
such as when wanting to import unformatted log files,
where the log lines can contain any characters.

Could it be an idea to exploit the fact that DELIMITER E'\n' is currently an error?

ERROR: COPY delimiter cannot be newline or carriage return

That is, to change E'\n' to be a valid delimiter, which would simply read each line
delimited by newlines, as a single column.

The hack I'm currently abusing is to find some one-byte character that is not present anywhere in the text file,
and then to use that character as a delimiter. This doesn't work when needing to deal with a text file
which content is unknown at the time when writing the code though, so it's mostly useful for throwaway one-off queries.

Thoughts?

/Joel

MySQL seems to already support using \n as a delimiter (I haven't verified it myself though) [1]

[1] https://stackoverflow.com/questions/18394620/postgres-import-file-that-has-columns-separated-by-new-lines

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Laurenz Albe 2021-05-05 15:48:19 Re: pg_receivewal makes a bad daemon
Previous Message Craig Ringer 2021-05-05 15:17:13 Re: Is txid_status() actually safe? / What is 011_crash_recovery.pl testing?