Skip site navigation (1) Skip section navigation (2)

Re: A few more questions

From: Oliver Jowett <oliver(at)opencloud(dot)com>
To: Chris Smith <chris(at)mindiq(dot)com>
Cc: pgsql-jdbc(at)postgresql(dot)org
Subject: Re: A few more questions
Date: 2004-02-14 22:13:47
Message-ID: 402E9D9B.8070607@opencloud.com (view raw or flat)
Thread:
Lists: pgsql-jdbc
Chris Smith wrote:

> 3. For someone who knows the v3 protocol; there are a number of cases where
> multiple queries are issued by one statement in the driver.  My understanding
> is that this is not allowed in extended query.  Does that mean that we're
> stuck with multiple round-trips if we substitute extended query?  Or may I
> send multiple consecutive parse, bind, and execute commands without an
> intervening read of the result; and then read all the results at once?

The latter. Look at the Sync and Flush messages too -- they're designed 
to deal with that sort of batching of queries.

> This latter bit seems to me like it risks walking into a trap where I'm
> blocked on my send buffer waiting for the server to read more of a statement,
> and the server is blocked on its send buffer waiting for me to read the
> results from a previous execute.  But basically, I'm thinking there has to be
> a way in v3 extended query to avoid this problem; otherwise, it would be a
> potentially serious performance issue with no solution.  Is there a way?

I raised this with Barry Lind a while back when I was looking at batch 
updates. There doesn't seem to be an elegant way around it. Ideas:

Option #1: limit the total size of outstanding sent data

   while (more queries) {
     send next query
     size = 0
     while (more queries && size + next query size < threshold) {
        send next query
        size += next query size
     }

     send Sync (or possibly Flush depending on how you deal with errors)
     read results for sent queries, deal with errors, etc
   }

Then we set threshold based on the expected buffer sizes between us and 
the server (kernel buffer etc).

The above code is a bit subtle .. we only need to count queries after 
the first towards the threshold, since we assume we only block when 
there are greater than threshold bytes unread by the server, and at a 
minimum the server will read the first query from the stream before 
blocking.

Also we could probably avoid waiting ignore packets that don't generate 
large resultsets in this calculation (or track our idea of how much data 
should be coming back and only start throttling our queries when it gets 
dangerous).

Option #2: use a separate write (probably easier) or read thread. This 
means creating threads from the driver, which is new, and I'm not sure 
how the additional context switches on every query will affect performance.

Option #3: switch to NIO but I don't think that's really feasible since 
we'd require 1.4, and NIO has a bunch of resource-exhaustion issues in 
current JVMs anyway.

Option #4: set a socket timeout on writes. I'm not sure how reliable 
this is, though, or if we can safely recover from timeouts that do occur.

> 4. As I plan the changes I'd like to make for v3 protocol support, it's
> starting to look tempting to segregate a good bit of the code into separate
> classes for v2 and v3.  Thus, we'd have:
> 
>     V2QueryExecutor - For v2 code
>     V3QueryExecutor - For v3 code
>     QueryExecutorUtil - For code shared between both classes above

Or subclass BaseQueryExecutor -> V2/V3QueryExecutor. (six of one..)

This was pretty high on my list of refactoring to do. Also consider 
using a single query executor instance per connection and killing the 
existing static methods -- currently we check the protocol version and 
create a new instance per query which seems entirely unnecessary.

> and the same for AbstractJdbc*Statement and AbstractJdbc*ResultSet.

I'm not so sure about this one though. You run the danger of a lot of 
code duplication here.

I was looking at a design where the statement-execution and 
result-gathering code was broken out into a internal 
interface/implementation classes, with separate implementations for V2 
and V3. These talked only in the native postgresql types known to the 
backend (int2, int4, int8, bytea, text, etc). Then there was a single 
Statement/ResultSet implementation that delegated to an instance of the 
interface, i.e. all the JDBC<->postgresql type mapping and 
Statement/ResultSet behaviour glue was in this common code.

The other thing I was doing here was to move the server-side preparation 
and translation/parsing of queries down into those protocol-specific 
classes -- as V2 and V3 work quite differently in these areas (the most 
obvious bit being that using the V3 extended protocol, you have to parse 
the query for multiple statements; you'd also want to maintain different 
state about cursors/PREPARE vs. named portals and statements).

I have some very early interface code I can send you if you'd like to 
take a look.

-O

In response to

Responses

pgsql-jdbc by date

Next:From: Tom AnsleyDate: 2004-02-14 22:30:39
Subject: unsubscribe
Previous:From: Dave CramerDate: 2004-02-14 21:32:20
Subject: Re: Object serialization and postgres tables,

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group