>This is interesting position to be taken by PostgreSQL. For simple
>query, particularly with regards to reporting, one can go about using
>text for all column. But a lot of time, I deal with large dataset
>ingestions and extracts that tends to take many hours to perform. So I
>usually prefer binary form data for performance reason.
The performance benefits of "binary" over "text" formats are often less
than you might think. In the real world, on-the-wire binary formats
often need parsing, just like text formats, and if they don't match your
internal binary format, still more conversion is required.
>Most of our company's works are done in Java nowadays, but for
>large data ingestions/extract/conversion, a C program out performs Java
>program fair and square, no debate there (based on actual run-time
>data). For clarification, we deal with various business partners that
>we have to import/export large amount of data between heterogeneous
>databases that the wonderful technology of database replication does not
>apply for our use.
I regularly import and export millions of rows between python and
PostgreSQL - my experience is that the TCP connection to the database,
and the database itself (disk I/O) is often the limiting factor, rather
than my choice of language. Where python is the limiting factor, it's
typically the instantiation of objects representing row values that
dominates, and that would be true whether a text or binary format was
used. The remaining cases are where the processing is non-trivial, and
in those cases, I would much rather be working in a high level language
where bugs are easier to identify. It's rare indeed that I resort to
coding it in C.
>I can get by using the socket function, ntohl() and ntohs() for smallint
>and int4 for the conversion. But there is no such thing for the float4
>and float8 as well as the date, timestamp you mentioned. I wish there
>is such function or macro available for the C interface.
So do I. Serialising floats and doubles is a common enough activity,
and it defies belief that the standard C libraries don't provide a way
to do it. Having every application that wants to pass floats over the
net implement it's own scheme is just asking for bugs.
I would suggest looking at the PostgreSQL source for one example of how
to do floats. Timestamps are actually composite types made up of other
primitive binary types. The only gotcha with the time types is that they
can use ints or floats, depending on how the server was built. When
using protocol 3 with a version 8 server, there is a parameter that
informs you which was chosen.
>I do not agree with the "ease of portability" issue, because if that is
>the case, one can always give up performance and use Java for that
>purpose. Beside, it should not be that much more difficult to provide
>win32 exe for PostgreSQL with PQgetvalue() that return int2, int4,
>float4, float8, date, datetime/timestamp, etc.
Note that you're preaching to the choir - I'm just a PG user, like you,
and I find the "binary" situation less than optimal. I wouldn't have
used it if I could work out how to pass arrays as text parameters (but
now that I know how the binary types work, I suspect I could solve the
text array problem by looking at the PG source).
Andrew McNamara, Senior Developer, Object Craft
In response to
pgsql-interfaces by date
|Next:||From: Jeroen T. Vermeulen||Date: 2007-05-22 06:10:25|
|Subject: Re: PQgetvalue failed to return column value for
non-text data in binary format|
|Previous:||From: Jeff Lynn||Date: 2007-05-21 21:46:37|
|Subject: Re: PQgetvalue failed to return column value for non-text
data in binary format|