Re: Speed dblink using alternate libpq tuple storage

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Marko Kreen <markokr(at)gmail(dot)com>
Cc: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)oss(dot)ntt(dot)co(dot)jp>, greg(at)2ndquadrant(dot)com, pgsql-hackers(at)postgresql(dot)org, mmoncure(at)gmail(dot)com, shigeru(dot)hanada(at)gmail(dot)com
Subject: Re: Speed dblink using alternate libpq tuple storage
Date: 2012-03-29 22:56:30
Message-ID: 16223.1333061790@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Marko Kreen <markokr(at)gmail(dot)com> writes:
> My conclusion is that row-processor API is low-level expert API and
> quite easy to misuse. It would be preferable to have something more
> robust as end-user API, the PQgetRow() is my suggestion for that.
> Thus I see 3 choices:

> 1) Push row-processor as main API anyway and describe all dangerous
> scenarios in documentation.
> 2) Have both PQgetRow() and row-processor available in <libpq-fe.h>,
> PQgetRow() as preferred API and row-processor for expert usage,
> with proper documentation what works and what does not.
> 3) Have PQgetRow() in <libpq-fe.h>, move row-processor to <libpq-int.h>.

I still am failing to see the use-case for PQgetRow. ISTM the entire
point of a special row processor is to reduce the per-row processing
overhead, but PQgetRow greatly increases that overhead. And it doesn't
reduce complexity much either IMO: you still have all the primary risk
factors arising from processing rows in advance of being sure that the
whole query completed successfully. Plus it conflates "no more data"
with "there was an error receiving the data" or "there was an error on
the server side". PQrecvRow alleviates the per-row-overhead aspect of
that but doesn't really do a thing from the complexity standpoint;
it doesn't look to me to be noticeably easier to use than a row
processor callback.

I think PQgetRow and PQrecvRow just add more API calls without making
any fundamental improvements, and so we could do without them. "There's
more than one way to do it" is not necessarily a virtue.

> Second conclusion is that current dblink row-processor usage is broken
> when user uses multiple SELECTs in SQL as dblink uses plain PQexec().

Yeah. Perhaps we should tweak the row-processor callback API so that
it gets an explicit notification that "this is a new resultset".
Duplicating PQexec's behavior would then involve having the dblink row
processor throw away any existing tuplestore and start over when it
gets such a call.

There's multiple ways to express that but the most convenient thing
from libpq's viewpoint, I think, is to have a callback that occurs
immediately after collecting a RowDescription message, before any
rows have arrived. So maybe we could express that as a callback
with valid "res" but "columns" set to NULL?

A different approach would be to add a row counter to the arguments
provided to the row processor; then you'd know a new resultset had
started if you saw rowcounter == 0. This might have another advantage
of not requiring the row processor to count the rows for itself, which
I think many row processors would otherwise have to do.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2012-03-30 00:54:25 Re: query cache
Previous Message Boszormenyi Zoltan 2012-03-29 22:48:07 Re: ECPG FETCH readahead