Re: Roadmap for FE/BE protocol redesign

From: Barry Lind <blind(at)xythos(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: Roadmap for FE/BE protocol redesign
Date: 2003-03-12 08:06:51
Message-ID: 3E6EEA9B.3050201@xythos.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-interfaces


> * Backend should pass its version number, database encoding, default
> client encoding, and possibly other data (any ideas?) to frontend during
> startup, to avoid need for explicit queries to get this info. We could
> also consider eliminating SET commands sent by libpq in favor of adding
> variable settings to startup packet's PGOPTIONS field. Ideally we could
> get back to the point where a standard connection startup takes only one
> packet in each direction.

This handles the JDBC needs (currently on startup the jdbc driver
selects the database encoding and version number and sets the datestyle
and autocommit parameters).

One addition I would personally like to see (it comes up in my apps
code) is the ability to detect wheather the server is big endian or
little endian. When using binary cursors this is necessary in order to
read int data. Currently I issue a 'select 1' statement at connection
startup to determine what format the server is using.

Other things I would like to see to help jdbc:

1) More information about the attributes selected in a query (I see
there is an entire thread on this already) to minimize the work
necessary to implement updateable result sets as defined by the jdbc spec.

2) Better support for domains. Currently the jdbc driver is broken with
regards to domains (although no one has reported this yet). The driver
will treat a datatype that is a domain as an unknown/unsupported
datatype. It would be great if the T response included the 'base'
datatype for a domain attribute so that the driver would know what
parsing routines to call to convert to/from the text representation the
backend expects.

3) Protocol level support for CURSORs. It would be nice if cursor
support was done at the protocol level and not as a SQL command. The
current default behavior of returning all results from a query in the
query response message is often a problem (can easily lead to out of
memory problems for poorly written queries). So it is desirable to use
cursors. But with the current implementation in SQL, cursors are not
the appropriate choice if a query is only going to return one or a few
rows. The reason is that using a cursor requires a minimum of three SQL
statements: DECLARE, FETCH, CLOSE. The jdbc driver issues the DECLARE
and FETCH in one server call, but the CLOSE needs to be a second call.
Thus for simple one row selects (which in many cases are the majority of
selects issued) using CURSORS requires two roundtrips to the server vs.
one for the nonCursor case.
This leaves me with a problem in the jdbc driver, I can either use
standard fast/performant queries for single row selects that blowup with
out of memory errors for large results, or I can use cursors and avoid
large memory usage but hurt overall performance. What I have currently
done is require that the developer call an extra method to turn on the
use of cursors when they know that the cursor is going to return a large
number of rows and leave the default be the non-cursor case. This works
but requires that developers who are writing code to interact with
multiple different databases, code differently for the postgres jdbc
driver. And this is a problem since one of the goals of jdbc is to be
able to write code that works against multiple different databases.
So I would request the ability of the client to set a max rows parameter
for query results. If a query were to return more than the max
number of rows, the client would be given a handle (essentially a cursor
name) that it could use to fetch additional sets of rows.

4) Protocol level support of PREPARE. In jdbc and most other
interfaces, there is support for parameterized SQL. If you want to take
advantage of the performance benefits of reusing parsed plans you have
to use the PREPARE SQL statement. My complaint on doing this at the SQL
level vs the protocol level is similar to the problem with cursors
above. To use prepare you need to issue three SQL statements: PREPARE,
EXCECUTE, DEALLOCATE. If you know ahead of time that you are going to
reuse a statement many times doing PREPARE, EXECUTE, EXECUTE, ...,
DEALLOCATE makes sense and can be a big win in performance. However if
you only ever execute the statement once then you need to use two round
trips (one for the PREPARE, EXECUTE and another for the DEALLOCATE)
versus one round trip to execute the statement 'normally'. So it
decreases performance to use prepares for all parameterized sql
statements. So the current implementation in jdbc requires the user to
issue a postgres specific call to turn on the use of prepared statements
for those cases the developer knows will be a performance win. But this
requires coding differently for postgres jdbc than for other databases.
So being better able to handle this in the protocol would be nice.

5) Better support for "large values". Generally I recommend that users
of jdbc use bytea to store large binary values. I generally tell people
to avoid using LOs (Large Objects). The reason for this is that LOs
have two significant problems: 1) security - any user on the database
can access all LOs even though they may not be able to access the row
that contains the LO reference, 2) cleanup - deleting the row containing
the LO reference doesn't delete the LO requireing extra code or triggers
to behave like a regular value in a regular column. Bytea works OK for
small to medium sized values, but doesn't work for very large values,
where by very large I mean over a few Megabytes. The reason very large
values are a problem is memory usage. There is no way to 'stream' bytea
values from the server like you can do with LOs, so the driver ends up
storeing the entire value in memory as it reads the result from the
backend for a query. And if the query returns multiple rows each with a
large value you quickly run out of memory.
So what I would like to see is the ability for the client to set a MAX
VALUE size parameter. The server would send up to this amount of data
for any column. If the value was longer than MAX VALUE, the server
would respond with a handle that the client could use to get the rest of
the value (in chunks of MAX VALUE) if it wanted to. This would allow
the client to get the entire result set which could contain perhaps many
large bytea values, but not use a lot of memory up front. Then fetch
the entire values only when/if the application asked for them and stream
the result to the application and never bring the entire contents of the
column into memory at once. (There are probably a number of different
implementation posibilities so use this one as a suggestion to explain
what I would like to see not necessarily how it should be implemented).

6) Better over the wire support for bytea. The current encoding of
binary data \000 results in a significant expansion in the size of data
transmitted. It would be nice if bytea data didn't result in 2 or 3
times data expansion. (and all the cpu cycles to convert to/from the
escaped format). This may not be a protocol issue, but IMHO the best
way to fix this would be in the protocol.

thanks,
--Barry

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Barry Lind 2003-03-12 08:43:48 Re: Roadmap for FE/BE protocol redesign
Previous Message Hiroshi Inoue 2003-03-12 08:02:19 Re: Roadmap for FE/BE protocol redesign

Browse pgsql-interfaces by date

  From Date Subject
Next Message Barry Lind 2003-03-12 08:43:48 Re: Roadmap for FE/BE protocol redesign
Previous Message Hiroshi Inoue 2003-03-12 08:02:19 Re: Roadmap for FE/BE protocol redesign