Re: New "raw" COPY format

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Joel Jacobson <joel(at)compiler(dot)org>
Cc: jian he <jian(dot)universality(at)gmail(dot)com>, Tatsuo Ishii <ishii(at)postgresql(dot)org>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: New "raw" COPY format
Date: 2024-11-04 18:34:28
Message-ID: CAD21AoC91_jaycZZ9xyqc9=Fr9DTR=3PmZ4p0XZogvAPw-YbCg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Nov 2, 2024 at 4:08 AM Joel Jacobson <joel(at)compiler(dot)org> wrote:
>
> On Fri, Nov 1, 2024, at 22:28, Masahiko Sawada wrote:
> > As I mentioned in a separate email, if we use the OS default EOL as
> > the default EOL in raw format, it would not be necessary to allow it
> > to be multi characters. I think it's worth considering it.
>
> I like the idea, but not sure I understand how it would work.
>
> What if a user's OS default is \n (LF) and this user wants
> to import a Windows text file \r\n (CR LR), which is a
> multi characters EOL delimiter.
>
> Was your idea to make an exception for that particular EOL,
> or to simply not support that edge case?

IIUC the text and csv formats already support it. We start from the
EOL_UNKNOWN state and guess the EOL marker while parsing the line. I
think we can do something similar to what we do in the text and csv
formats but we won't need to care about quotes and escapes in the raw
format.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Daniel Gustafsson 2024-11-04 18:34:57 Re: Allow specifying a dbname in pg_basebackup connection string
Previous Message Tom Lane 2024-11-04 18:32:53 Re: Always have pg_dump write rules in a consistent order