Re: Force lookahead in COPY FROM parsing

From: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
To: John Naylor <john(dot)naylor(at)enterprisedb(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Force lookahead in COPY FROM parsing
Date: 2021-04-06 17:50:11
Message-ID: 6e1305a7-09c0-1e1f-f2a3-68e641aab431@iki.fi
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 02/04/2021 20:21, John Naylor wrote:
> I have nothing further so it's RFC. The patch is pretty simple compared
> to the earlier ones, but is worth running the fuzzer again as added
> insurance?

Good idea. I did that, and indeed it revealed bugs. If the client sent
just a single byte in one CopyData message, we only loaded that one byte
into the buffer, instead of the full 4 bytes needed for lookahead.
Attached is a new version that fixes that.

Unfortunately, that's not the end of it. Consider the byte sequence
"\.<NL><some invalid bytes>" appearing at the end of the input. We
should detect the end-of-copy marker \. and stop reading without
complaining about the garbage after the end-of-copy marker. That doesn't
work if we force 4 bytes of lookahead; the invalid byte sequence fits in
the lookahead window, so we will try to convert it.

I'm sure that can be fixed, for example by adding special handling for
the last few bytes of the input. But it needs some more thinking, this
patch isn't quite ready to be committed yet :-(.

- Heikki

Attachment Content-Type Size
v4-0001-Simplify-COPY-FROM-parsing-by-forcing-lookahead.patch text/x-patch 9.1 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2021-04-06 18:02:31 Re: Minimal logical decoding on standbys
Previous Message Mark Wong 2021-04-06 17:39:48 Re: GSoc Applicant