Re: Patch for binary receive of basic array types

From: Mikko Tiihonen <mikko(dot)tiihonen(at)nitorcreations(dot)com>
To: pgsql-jdbc(at)postgresql(dot)org
Subject: Re: Patch for binary receive of basic array types
Date: 2011-09-24 10:28:35
Message-ID: 4E7DB0D3.7030408@nitorcreations.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-jdbc

On 09/24/2011 12:49 AM, Kevin Grittner wrote:
> Mikko Tiihonen<mikko(dot)tiihonen(at)nitorcreations(dot)com> wrote:
>
>> summary: speed-up 3-57%, row-length on the wire increases
>> typically between 50% and 200% (smaller increases for larger
>> arrays or complex types)
>>
>> I think the speed-up is enough to justify using of binary
>> transfers for arrays by default even though the data amount on the
>> wire is increased.
>
> What would it take to get the other behavior if you have a network
> bottleneck but CPU time to spare?

I did some (too much) measurements & calculations, which are below.

In general could add a mechanism to add/remove specific oids from the
binary transfer mode as a configuration option to the jdbc driver.

The same mechanism could be used for by application code that uses custom types
inside PostgreSQL to take advantage of binary transfers.

There will always be trade-offs between txt and bin encoding and sometimes
the differences are large enough to change from driver defaults.

----

Measurements:
for '{1,2,3}' stored in int4[] column
in txt form: column in bytes=7, whole response tcp packet in bytes=109
in bin form: column in bytes=44, whole response tcp packet in bytes=146
While column size increased 500%, the tcp packet increased 34%.

for '{1,2,3,....,99,100}' stored in int4[] column
in txt form: column in bytes=293, whole response tcp packet in bytes=395
in bin form: column in bytes=820, whole response tcp packet in bytes=922
While column size increased 180%, the tcp packet increased 130%.

for '{10000,10001,10002,..,10098,10098}' stored in int4[] column
in txt form: column in bytes=601, whole response tcp packet in bytes=703
in bin form: column in bytes=820, whole response tcp packet in bytes=922
While column size increased 32%, the tcp packet increased 31%.

Calculations:
The txt encoding uses 2-7 bytes per value in int2[].
The bin encoding uses 6 bytes per value in int2[] + 19 extra overhead bytes.

The txt encoding uses 2-12 bytes per value in int4[].
The bin encoding uses 8 bytes per value in int4[] + 19 extra overhead bytes.

The txt encoding uses 2-22 bytes per value in int8[].
The bin encoding uses 12 bytes per value in int8[] + 19 extra overhead bytes.

For very small arrays, and especially arrays containing small numeric values
the txt encoding uses less bytes but is slower to generate and parse.

For extreme case of only storing single digit numbers in int8[] the tcp packet
size of bin can be 6x the size of txt encoding.

For extreme case of only storing 21 digit negative numbers in int8[] the tcp packet
size of txt can be 1.8x the size of bin encoding.

Using multidimensional arrays is usually of small advantage to bin format.

One can transfer from 500k to 5000k txt int8 values per second in array or
800k bin int8 values per second in array over a 100mbit line.

In response to

Browse pgsql-jdbc by date

  From Date Subject
Next Message Mikko Tiihonen 2011-09-24 12:33:01 Patch for binary transfer of point and box geometric types
Previous Message alexbruy 2011-09-24 07:59:06 Re: Prepared statement with function as argument: how to bind values?