Re: NOLOGGING option, or ?

From: "Luke Lonergan" <llonergan(at)greenplum(dot)com>
To: "Steve Atkins" <steve(at)blighty(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: NOLOGGING option, or ?
Date: 2005-06-02 15:26:28
Message-ID: BEC47334.6CC2%llonergan@greenplum.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Steve,

> I can only think of one where it's common. Windows filenames.

Nearly all weblog data then.

> But if
> you're going to support arbitrary data in a load then whatever escape
> character you choose will appear sometimes.

If we allow an 8-bit character set in the "text" file, then yes, any
delimiter you choose has the potential to appear in your input data. In
practice, with *mostly* 7-bit ASCII characters and even with international
8-bit text encodings, you can choose a delimiter and newline that work well.
Exceptions are handled by the forthcoming single row error handling patch.

> I strongly suspect that a patch to improve performance without changing
> behaviour would be accepted with no questions asked.

Understood - not sure it's the best thing for support of the users yet.
We've found a large number of issues from customers with the unmodified
behavior.

> There are already two loader routines. One of them is text-based and is
> designed for easy generation of data load format using simple text
> manipulation tools by using delimiters. It also allows (unlike your
> suggestion) for loading of arbitrary data from a text file.

Not to distract, but try loading a binary null into a text field. The
assumption of null terminated strings penetrates deep into the codebase.
The existing system does not allow for loading arbitrary data from a text
file.

Our suggestion allows for escapes, but requires the ability to specify
alternate characters or none.

> Because it allows for arbitrary data and uses delimiters to separate
> fields it has to use an escaping mechanism.
>
> If you want to be able to load arbitrary data and not have to handle
> escape characters there's are two obvious ways to do it.

Let's dispense with the notion that we're suggesting no escapes (see above).

Binary with a bookends format is a fine idea and would be my personal
preference if it were fast, which it isn't. Customers in the web log
analysis and other data warehousing fields prefer "mostly 7-bit" ascii text
input, which we're trying to support with this change.

- Luke

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Dave Page 2005-06-02 15:37:38 Re: Google's Summer of Code ...
Previous Message Vishal Kashyap @ [SaiHertz] 2005-06-02 15:19:18 Re: Google's Summer of Code ...