Re: generic copy options

From: Emmanuel Cecchet <manu(at)frogthinker(dot)org>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Emmanuel Cecchet <manu(at)asterdata(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Emmanuel Cecchet <Emmanuel(dot)Cecchet(at)asterdata(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: generic copy options
Date: 2009-09-21 02:24:28
Message-ID: 4AB6E3DC.3080803@frogthinker.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

The easiest for both implementation and documentation might just be to
have a matrix of options.
Each option has a row and a column in the matrix. The intersection of a
row and a column is set to 0 if options are not compatible and set to 1
if it is. This way we are sure to capture all possible combinations.
This way, each time we find a new option, we just have to check in the
matrix if it is compatible with the already existing options. Note that
we can also replace the 0 with an index in an error message array.

I can provide an implementation of that if this looks interesting to anyone.
Emmanuel

Robert Haas wrote:
> On Sun, Sep 20, 2009 at 2:25 PM, Emmanuel Cecchet <manu(at)asterdata(dot)com> wrote:
>
>> Tom Lane wrote:
>>
>>> Emmanuel Cecchet <manu(at)asterdata(dot)com> writes:
>>>
>>>> Here you will force every format to use the same set of options
>>>>
>>> How does this "force" any such thing?
>>>
>>>
>> As far as I understand it, every format will have to handle every format
>> options that may exist so that they can either implement it or throw an
>> error.
>>
>
> I don't think this is really true. To be honest with you, I think
> it's exactly backwards. The way the option-parsing logic works, we
> parse each option individually FIRST. Then at the end we do
> cross-checks to see whether there is an incompatibility in the
> combination specified. So if two different formats support the same
> option, we just change the cross-check to say that foo is OK with
> either format bar or format baz. On the other hand, if we split the
> option into bar_foo and baz_foo, then the first loop that does the
> initial parsing has to support both cases, and then you still need a
> separate cross-check for each one.
>
>
>> That would argue in favor of a format option that defines the format. Right
>> now I find it bogus to have to say (csv on, csv_header on). If csv_header is
>> on that should imply csv on.
>> The only problem I have is that it is not obvious what options are generic
>> COPY options and what are options of an option (like format options).
>> So maybe a tradeoff is to differentiate format specific options like in:
>> (delimiter '.', format csv, format_header, format_escape...)
>> This should also make clear if someone develops a new format what options
>> need to be addressed.
>>
>
> I think this is a false dichotomy. It isn't necessarily the case that
> every format will support a delimiter option either. For example, if
> we were to add an XML or JSON format (which I'm not at all convinced
> is a good idea, but I'm sure someone is going to propose it!) it
> certainly won't support specifying an arbitrary delimiter.
>
> IOW, *every* format will have different needs and we can't necessarily
> know which options will be applicable to those needs. But as long as
> we agree that we won't use the same option for two different
> format-specific options with wildly different semantics, I don't think
> that undecorated names are going to cause us much trouble. It's also
> less typing.
>
>
>> PS: I don't know why but as I write this message I already feel that Tom
>> hates this new proposal :-D
>>
>
> I get those feeling sometimes myself. :-) Anyway, FWIW, I think Tom
> has analyzed this one correctly...
>
> ...Robert
>
>

--
Emmanuel Cecchet
FTO @ Frog Thinker
Open Source Development & Consulting
--
Web: http://www.frogthinker.org
email: manu(at)frogthinker(dot)org
Skype: emmanuel_cecchet

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2009-09-21 02:54:39 Re: Linux LSB init script
Previous Message Robert Haas 2009-09-21 01:25:17 Re: Resjunk sort columns, Heikki's index-only quals patch, and bug #5000