Re: BUG #12320: json parsing with embedded double quotes

From: Francisco Olarte <folarte(at)peoplecall(dot)com>
To: Aaron Botsis <aaron(at)bt-r(dot)com>
Cc: postgres(at)bt-r(dot)com, "pgsql-bugs(at)postgresql(dot)org" <pgsql-bugs(at)postgresql(dot)org>
Subject: Re: BUG #12320: json parsing with embedded double quotes
Date: 2015-01-06 18:52:28
Message-ID: CA+bJJbz2EfCb-7OyizwE8e3b9MnZy9kyKkEdwrfdZ+aVsYCfog@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs pgsql-hackers

Hi Aaron:

On Tue, Jan 6, 2015 at 7:06 PM, Aaron Botsis <aaron(at)bt-r(dot)com> wrote:

> Hi Francisco, I’m aware, but still consider this to be a bug, or at least
> a great opportunity for an enhancement. :)
>

Maybe, but you are going to have a problem.

> This had bitten me for the third time while trying to import some json
> data. It’d be great to bypass the copy escaping (and possibly other meta
> characters) when the column type is json or jsonb. I’d be happy to try and
> write it and submit a patch if folks believe this is an acceptable way to
> go… That said, I should probably read what the process is for this kind of
> thing :)
>

Reading this, you are talking about 'the column being json'. COPY needs to
do the escaping at the same time it's constructing the columns. The present
way is easy to do, read char by char, if it's a escape, process next char
acumulating into current field, otherwise see whether it is a field or
record separator and act accordingly. It's also layered, when you construct
the records for copy you get all the field data, turn them into escaped
strings, join them by the field separator and spit them out followed by a
record separator ( practical implementations may do this virtually ).
Intermixing this with the 'I'm in a json column' would need to pass
information from the upper layer, and make it more difficult and, specially
error prone. What do you do if ( using the standard delimiters ) your json
value has embeded newlines and tabs ( which, IIRC, are legal in several
places inside the json ). And all this to make some incorrectly formatted
files read ( which can be correctly formatted with a perl one liner or
something similar ). I'm not the one to decide, but I will vote against
including that ( but do not trust me too much, I would also vote against
including 'csv' which I consider the root of many evils ).

Francisco Olarte.

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Peter Geoghegan 2015-01-06 21:52:04 RLS bug?
Previous Message Aaron Botsis 2015-01-06 18:06:38 Re: BUG #12320: json parsing with embedded double quotes

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2015-01-06 18:56:12 Re: Re: [COMMITTERS] pgsql: Change how first WAL segment on new timeline after promotion is
Previous Message Robert Haas 2015-01-06 18:46:57 Re: Re: [COMMITTERS] pgsql: Change how first WAL segment on new timeline after promotion is