Re: COPY table_name (single_column) FROM 'unknown.txt' DELIMITER E'\n'

From: "Joel Jacobson" <joel(at)compiler(dot)org>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Andrew Dunstan" <andrew(at)dunslane(dot)net>
Cc: "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>, "Isaac Morland" <isaac(dot)morland(at)gmail(dot)com>, "Chapman Flack" <chap(at)anastigmatix(dot)net>, "PostgreSQL Hackers" <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: COPY table_name (single_column) FROM 'unknown.txt' DELIMITER E'\n'
Date: 2021-05-06 05:52:11
Message-ID: a8a39fbf-cf4f-4292-b994-fe997a45946c@www.fastmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, May 5, 2021, at 21:51, Tom Lane wrote:
> Andrew Dunstan <andrew(at)dunslane(dot)net <mailto:andrew%40dunslane.net>> writes:
> > On 5/5/21 2:45 PM, Tom Lane wrote:
> >> Yeah, that's because of the conversion to "chr". But a regexp
> >> is overkill for that anyway. Don't we have something that will
> >> split on simple substring matches?
>
> > Not that I know of. There is split_part but I don't think that's fit for
> > purpose here. Do we need one, or have I missed something?
>
> [ checks manual ... ]
>
> string_to_array or string_to_table would do, I think.
>
> regards, tom lane
>

Thanks for these new functions, they seem really useful for a lot of cases.

However, I see two problems with using string_to_table() for this particular use-case.

- Doesn't work with files larger than 1GB, due to pg_read_file()'s limit.
- 68% slower than using the COPY-hack.

% ls -lah foo.txt
-rw-r--r-- 1 joel staff 623M May 6 07:31 foo.txt

% wc -l foo.txt
6771864 foo.txt

# \d txt
Table "public.txt"
Column | Type | Collation | Nullable | Default
--------+------+-----------+----------+---------
line | text | | |

# COPY txt (line) FROM 'foo.txt' DELIMITER '"';
COPY 6771864
Time: 9829.707 ms (00:09.830)
Time: 9552.286 ms (00:09.552)
Time: 9483.115 ms (00:09.483)

# TRUNCATE txt;
TRUNCATE TABLE

# INSERT INTO txt (line) SELECT string_to_table(pg_read_file('foo.txt'),E'\n');
INSERT 0 6771865
Time: 16556.078 ms (00:16.556)
Time: 14720.343 ms (00:14.720)
Time: 17266.088 ms (00:17.266)

/Joel

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Thomas Kellerer 2021-05-06 06:10:20 Re: COPY table_name (single_column) FROM 'unknown.txt' DELIMITER E'\n'
Previous Message vignesh C 2021-05-06 05:52:09 Re: Replication slot stats misgivings