From: | "Luke Lonergan" <llonergan(at)greenplum(dot)com> |
---|---|
To: | "Steve Atkins" <steve(at)blighty(dot)com>, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: NOLOGGING option, or ? |
Date: | 2005-06-02 15:26:28 |
Message-ID: | BEC47334.6CC2%llonergan@greenplum.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Steve,
> I can only think of one where it's common. Windows filenames.
Nearly all weblog data then.
> But if
> you're going to support arbitrary data in a load then whatever escape
> character you choose will appear sometimes.
If we allow an 8-bit character set in the "text" file, then yes, any
delimiter you choose has the potential to appear in your input data. In
practice, with *mostly* 7-bit ASCII characters and even with international
8-bit text encodings, you can choose a delimiter and newline that work well.
Exceptions are handled by the forthcoming single row error handling patch.
> I strongly suspect that a patch to improve performance without changing
> behaviour would be accepted with no questions asked.
Understood - not sure it's the best thing for support of the users yet.
We've found a large number of issues from customers with the unmodified
behavior.
> There are already two loader routines. One of them is text-based and is
> designed for easy generation of data load format using simple text
> manipulation tools by using delimiters. It also allows (unlike your
> suggestion) for loading of arbitrary data from a text file.
Not to distract, but try loading a binary null into a text field. The
assumption of null terminated strings penetrates deep into the codebase.
The existing system does not allow for loading arbitrary data from a text
file.
Our suggestion allows for escapes, but requires the ability to specify
alternate characters or none.
> Because it allows for arbitrary data and uses delimiters to separate
> fields it has to use an escaping mechanism.
>
> If you want to be able to load arbitrary data and not have to handle
> escape characters there's are two obvious ways to do it.
Let's dispense with the notion that we're suggesting no escapes (see above).
Binary with a bookends format is a fine idea and would be my personal
preference if it were fast, which it isn't. Customers in the web log
analysis and other data warehousing fields prefer "mostly 7-bit" ascii text
input, which we're trying to support with this change.
- Luke
From | Date | Subject | |
---|---|---|---|
Next Message | Dave Page | 2005-06-02 15:37:38 | Re: Google's Summer of Code ... |
Previous Message | Vishal Kashyap @ [SaiHertz] | 2005-06-02 15:19:18 | Re: Google's Summer of Code ... |