Re: COPY table_name (single_column) FROM 'unknown.txt' DELIMITER E'\n'

From: "Joel Jacobson" <joel(at)compiler(dot)org>
To: "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>, "Isaac Morland" <isaac(dot)morland(at)gmail(dot)com>
Cc: "Chapman Flack" <chap(at)anastigmatix(dot)net>, "PostgreSQL Hackers" <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: COPY table_name (single_column) FROM 'unknown.txt' DELIMITER E'\n'
Date: 2021-05-05 18:35:03
Message-ID: 45428207-d977-4c15-94b3-d85dd2911e10@www.fastmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, May 5, 2021, at 19:58, David G. Johnston wrote:
> On Wed, May 5, 2021 at 10:34 AM Isaac Morland <isaac(dot)morland(at)gmail(dot)com> wrote:
>> On Wed, 5 May 2021 at 13:23, Chapman Flack <chap(at)anastigmatix(dot)net> wrote:
>>> On 05/05/21 13:02, David G. Johnston wrote:
>>> > Why not just allow: "DELIMITER NONE" to be valid syntax meaning exactly
>>> > what it says and does exactly what you desire?
>>>
>>> What would it mean? That you get one column, multiple rows of text
>>> corresponding to "lines" delimited by something, or that you get one
>>> column, one row of text for the entire content of the file?
>>
>> It means no column delimiter. In other words, there is no character which marks the end of a data value, so the entire line is a single data value.
>>
>
> This. When dealing with COPY it's expected that each line becomes its own row. On the server you can do pg_read_file() if you need the entire file to be considered a single value. psql (\I and variables) is a bit more hackey, but I'd rather see that improved directly anyway if the goal is to try and make getting the "whole document" easier - copy isn't the right API for that IMO.

I think you misunderstood the problem.
I don't want the entire file to be considered a single value.
I want each line to become its own row, just a row with a single column.

So I actually think COPY seems like a perfect match for the job,
since it does precisely that, except there is no delimiter in this case.

I'm currently using the pg_read_file()-hack in a project,
and even though it can read files up to 1GB,
using e.g. regexp_split_to_table() to split on E'\n'
seems to need 4x as much memory, so it only
works with files less than ~256MB.

SELECT COUNT(*) FROM regexp_split_to_table(repeat(E'\n',1000000000),E'\n');
ERROR: invalid memory alloc request size 4000000004
Time: 4151.374 ms (00:04.151)

/Joel

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Geoghegan 2021-05-05 18:35:45 Re: MaxOffsetNumber for Table AMs
Previous Message Andres Freund 2021-05-05 18:28:43 Re: RFC: Detailed reorder buffer stats dumps