Re: raw output from copy

From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Corey Huinker <corey(dot)huinker(at)gmail(dot)com>
Cc: Daniel Verite <daniel(at)manitou-mail(dot)org>, hlinnaka <hlinnaka(at)iki(dot)fi>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Pavel Golub <pavel(at)microolap(dot)com>, Craig Ringer <craig(at)2ndquadrant(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>
Subject: Re: raw output from copy
Date: 2016-03-12 06:24:49
Message-ID: CAFj8pRDu01zic74KZkG2=7vBUBddRRb1haPrXsmuv3zKHPwVuQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi

2016-03-09 18:41 GMT+01:00 Corey Huinker <corey(dot)huinker(at)gmail(dot)com>:

>
>>> The regression tests seem to adequately cover all new functionality,
>>> though I wonder if we should add some cases that highlight situations where
>>> BINARY mode is insufficient.
>>>
>>>
> One thing I tried to test RAW was to load an existing json file.
>
> My own personal test was to load an existing .json file into a 1x1 bytea
> table, which worked. From there I was able to
> select encode(col_name,'escape')::text::jsonb from test_table
> and the json was correctly converted.
>
> A similar test copying binary failed.
>
> A write up of the test looks like this:
>
>
> \copy (select '{"foo": "bar"}') to '/tmp/raw_test.jsonb' (format raw);
> COPY 1
> create temporary table raw_byte (b bytea);
> CREATE TABLE
> create temporary table raw_text (t text);
> CREATE TABLE
> \copy raw_jsonb from '/tmp/raw_test.blob' (format raw);
> psql:/home/ubuntu/raw_test.sql:9: ERROR: relation "raw_jsonb" does not
> exist
> \copy raw_byte from '/tmp/raw_test.blob' (format raw);
> COPY 1
> select encode(b,'escape')::text::json from raw_byte;
> encode
> ----------------
> {"foo": "bar"}
> (1 row)
>
> \copy raw_text from '/tmp/raw_test.blob' (format raw);
> COPY 1
> select t::jsonb from raw_text;
> t
> ----------------
> {"foo": "bar"}
> (1 row)
>
> create temporary table binary_byte (b bytea);
> CREATE TABLE
> create temporary table binary_text (t text);
> CREATE TABLE
> \copy binary_byte from '/tmp/raw_test.blob' (format binary);
> psql:/home/ubuntu/raw_test.sql:22: ERROR: COPY file signature not
> recognized
> select encode(b,'escape')::jsonb from binary_byte;
> encode
> --------
> (0 rows)
>
> \copy binary_text from '/tmp/raw_test.blob' (format binary);
> psql:/home/ubuntu/raw_test.sql:26: ERROR: COPY file signature not
> recognized
> select t::jsonb from binary_text;
> t
> ---
> (0 rows)
>
>
> So, *if* we want to add a regression test to demonstrate to posterity why
> we need RAW for cases that BINARY can't handle, I offer the attached file.
>

I don't think so regress tests should to do this demonstration. It is
clean, so COPY BINARY should to fail every time, and then there is not any
benefit from it in regress tests. There are lot of discussion in this
thread, and we don't need to inject more "garbage" to regress tests.

>
> Does anyone else see value in adding that to the regression tests?
>

>
>
>> Before I give my approval, I want to read it again more closely to make
>>> sure that no cases were skipped with regard to the (binary || raw) and
>>> (binary || !raw) tests. Also, I want to use it on some of my problematic
>>> files. Maybe I'll find a good edge case. Probably not.
>>>
>>
> I don't know why I thought this, but when I looked at the patch, I assumed
> that the ( binary || raw ) tests were part of a large if/elseif/else
> waterfall. They are not. They stand alone. There are no edge cases to find.
>

This is organized to files by necessity to work with external files. The
regress tests for COPY RAW has about 100 lines - so why need special files
and infrastructure. COPY RAW, COPY BINARY tests well shares infrastructure.

>
> Review complete and passed. I can re-review if we want to add the
> additional test.
>
>
Great, thank you very much. I hope so this feature really useful. It allow
to simple export/import XML doc in different encodings, JSONs and can be
enhanced future via options. The nice feature (but not for this release)
can be additional cast info for import -- like "COPY table(jsonb_column)
FROM stdin (FORMAT RAW, CAST json_2_jsonb). Because there are the options,
there are big space for other enhancing.

Regards

Pavel

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2016-03-12 06:58:56 Re: Explain [Analyze] produces parallel scan for select Into table statements.
Previous Message Dilip Kumar 2016-03-12 06:24:10 Re: Relation extension scalability