"Luke Lonergan" <llonergan(at)greenplum(dot)com> writes:
> In the data warehousing industry, data conversion and manipulation is
> normally kept distinct from data loading.
It's a bit strange to call this conversion or manipulation. One way or another
you have to escape whatever your delimiters are. How would you propose loading
strings that contain newlines?
The ETL transformations you're talking about are a different beast entirely.
You're talking about things like canonicalizing case or looking up foreign key
ids to replace strings and such.
Simply parsing the file format properly isn't part of that game. Otherwise
where do you stop? You could take this to a silly extreme and just say
postgres should just load each line as a record with single text field and let
"tools" deal with actually parsing. Or better yet, load the whole thing as a
single big blob.
Personally I would prefer to make prepared inserts as efficient as COPY and
deprecate COPY. Then we could have an entirely client-side tool that handled
as many formats as people want to implement without complicating the server.
Things like various vintages of Excel, fixed column files, etc should all be
handled as plugins for such a tool.
That would have the side benefit of allowing people to do other batch jobs
efficiently. Pipelining parameters to hundreds of executions of a prepared
query in the network.
Actually it seems like there's no particular reason the NOLOGGING option Tom
described (where it only inserts on new pages, doesn't have any special WAL
entries, just fsyncs at the end instead of WAL logging) can't work with
arbitrary inserts. Somehow some state has to be preserved remembering which
pages the nologging inserts have created and hold locks on.
In response to
pgsql-hackers by date
|Next:||From: Mary Edie Meredith||Date: 2005-06-02 18:49:28|
|Subject: Re: O_DIRECT for WAL writes|
|Previous:||From: Alon Goldshuv||Date: 2005-06-02 16:54:56|
|Subject: Re: NOLOGGING option, or ?|