Quick Links

Re: Speed dblink using alternate libpq tuple storage

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Marko Kreen <markokr(at)gmail(dot)com>
Cc:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)oss(dot)ntt(dot)co(dot)jp>, greg(at)2ndquadrant(dot)com, pgsql-hackers(at)postgresql(dot)org, mmoncure(at)gmail(dot)com, shigeru(dot)hanada(at)gmail(dot)com
Subject:	Re: Speed dblink using alternate libpq tuple storage
Date:	2012-03-22 22:07:16
Message-ID:	15437.1332454036@sss.pgh.pa.us
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Marko Kreen <markokr(at)gmail(dot)com> writes:
> [ patches against libpq and dblink ]

I think this patch needs a lot of work.

AFAICT it breaks async processing entirely, because many changes have been
made that fail to distinguish "insufficient data available as yet" from
"hard error". As an example, this code at the beginning of
getAnotherTuple:

/* Get the field count and make sure it's what we expect */
if (pqGetInt(&tupnfields, 2, conn))
! return EOF;

is considering three cases: it got a 2-byte integer (and can continue on),
or there aren't yet 2 more bytes available in the buffer, in which case it
should return EOF without doing anything, or pqGetInt detected a hard
error and updated the connection error state accordingly, in which case
again there is nothing to do except return EOF. In the patched code we
have:

/* Get the field count and make sure it's what we expect */
if (pqGetInt(&tupnfields, 2, conn))
! {
! /* Whole the message must be loaded on the buffer here */
! errmsg = libpq_gettext("protocol error\n");
! goto error_and_forward;
! }

which handles neither the second nor third case correctly: it thinks that
"data not here yet" is a hard error, and then makes sure it is an error by
destroying the parsing state :-(. And if in fact pqGetInt did log an
error, that possibly-useful error message is overwritten with an entirely
useless "protocol error" text.

I don't think the error return cases for the row processor have been
thought out too well either. The row processor is not in charge of what
happens to the PGresult, and it certainly has no business telling libpq to
just "exit immediately from the topmost libpq function". If we do that
we'll probably lose sync with the data stream and be unable to recover use
of the connection at all. Also, do we need to consider any error cases
for the row processor other than out-of-memory? If so it might be a good
idea for it to have some ability to store a custom error message into the
PGconn, which it cannot do given the current function API.

In the same vein, I am fairly uncomfortable with the blithe assertion that
a row processor can safely longjmp out of libpq. This was never foreseen
in the original library coding and there are any number of places that
that might break, now or in the future. Do we really need to allow it?
If we do, it would be a good idea to decorate the libpq functions that are
now expected to possibly longjmp with comments saying so. Tracing all the
potential call paths that might be aborted by a longjmp is an essential
activity anyway.

Another design deficiency is PQskipResult(). This is badly designed for
async operation because once you call it, it will absolutely not give back
control until it's read the whole query result. (It'd be better to have
it set some kind of flag that makes future library calls discard data
until the query result end is reached.) Something else that's not very
well-designed about it is that it might throw away more than just incoming
tuples. As an example, suppose that the incoming data at the time you
call it consists of half a dozen rows followed by an ErrorResponse. The
ErrorResponse will be eaten and discarded, leaving the application no clue
why its transaction has been aborted, or perhaps even the entire session
cancelled. What we probably want here is just to transiently install a
row processor that discards all incoming data, but the application is
still expected to eventually fetch a PGresult that will tell it whether
the server side of the query completed or not.

In the dblink patch, given that you have to worry about elogs coming out
of BuildTupleFromCStrings and tuplestore_puttuple anyway, it's not clear
what is the benefit of using malloc rather than palloc to manage the
intermediate buffers in storeHandler --- what that seems to mostly
accomplish is increase the risk of permanent memory leaks. I don't see
much value in per-column buffers either; it'd likely be cheaper to just
palloc one workspace that's big enough for all the column strings
together. And speaking of leaks, doesn't storeHandler leak the
constructed tuple on each call, not to mention whatever might be leaked by
the called datatype input functions?

It also seems to me that the dblink patch breaks the case formerly handled
in materializeResult() of a PGRES_COMMAND_OK rather than PGRES_TUPLES_OK
result. The COMMAND case is supposed to be converted to a single-column
text result, and I sure don't see where that's happening now.

BTW, am I right in thinking that some of the hunks in this patch are meant
to fix a pre-existing bug of failing to report the correct connection
name? If so, it'd likely be better to split those out and submit as a
separate patch, instead of conflating them with a feature addition.

Lastly, as to the pqgetrow patch, the lack of any demonstrated test case
for these functions makes me uncomfortable as to whether they are well
designed. Again, I'm unconvinced that the error handling is good or that
they work sanely in async mode. I'm inclined to just drop these for this
go-round, and to stick to providing the features that we can test via the
dblink patch.

regards, tom lane

In response to

Re: Speed dblink using alternate libpq tuple storage at 2012-03-07 20:21:57 from Marko Kreen

Responses

Re: Speed dblink using alternate libpq tuple storage at 2012-03-23 09:08:04 from Kyotaro HORIGUCHI
Re: Speed dblink using alternate libpq tuple storage at 2012-03-24 00:22:24 from Marko Kreen

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Peter Geoghegan	2012-03-22 22:07:24	Re: Publish checkpoint timing and sync files summary data to pg_stat_bgwriter
Previous Message	Andrew Dunstan	2012-03-22 22:05:30	Re: Uppercase tab completion keywords in psql?