Quick Links

Re: Speed dblink using alternate libpq tuple storage

From:	Marko Kreen <markokr(at)gmail(dot)com>
To:	Shigeru Hanada <shigeru(dot)hanada(at)gmail(dot)com>
Cc:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)oss(dot)ntt(dot)co(dot)jp>, mmoncure(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org, greg(at)2ndquadrant(dot)com
Subject:	Re: Speed dblink using alternate libpq tuple storage
Date:	2012-02-24 15:46:16
Message-ID:	20120224154616.GA16985@gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Tue, Feb 14, 2012 at 01:39:06AM +0200, Marko Kreen wrote:
> I tried imaging some sort of getFoo() style API for fetching in-flight
> row data, but I always ended up with "rewrite libpq" step, so I feel
> it's not productive to go there.
>
> Instead I added simple feature: rowProcessor can return '2',
> in which case getAnotherTuple() does early exit without setting
> any error state. In user side it appears as PQisBusy() returned
> with TRUE result. All pointers stay valid, so callback can just
> stuff them into some temp area. ATM there is not indication though
> whether the exit was due to callback or other reasons, so user
> must detect it based on whether new temp pointers appeares,
> which means those must be cleaned before calling PQisBusy() again.
> This actually feels like feature, those must not stay around
> after single call.

To see how iterating a resultset would look like I implemented PQgetRow()
function using the currently available public API:

/*
* Wait and return next row in resultset.
*
* returns:
* 1 - row data available, the pointers are owned by PGconn
* 0 - result done, use PQgetResult() to get final result
* -1 - some problem, check connection error
*/
int PQgetRow(PGconn *db, PGresult **hdr_p, PGrowValue **row_p);

code at:

https://github.com/markokr/libpq-rowproc-demos/blob/master/getrow.c

usage:

/* send query */
if (!PQsendQuery(db, q))
die(db, "PQsendQuery");

/* fetch rows one-by-one */
while (1) {
rc = PQgetRow(db, &hdr, &row);
if (rc > 0)
proc_row(hdr, row);
else if (rc == 0)
break;
else
die(db, "streamResult");
}
/* final PGresult, either PGRES_TUPLES_OK or error */
final = PQgetResult(db);

It does not look like it can replace the public callback API,
because it does not work with fully-async connections well.
But it *does* continue the line of synchronous APIs:

- PQexec(): last result only
- PQsendQuery() + PQgetResult(): each result separately
- PQsendQuery() + PQgetRow() + PQgetResult(): each row separately

Also the generic implementation is slightly messy, because
it cannot assume anything about surrounding usage patterns,
while same code living in some user framework can. But
for simple users who just want to synchronously iterate
over resultset, it might be good enough API?

It does have a inconsistency problem - the row data does
not live in PGresult but in custom container. Proper
API pattern would be to have PQgetRow() that gives
functional PGresult, but that is not interesting for
high-performace users. Solutions:

- rename to PQrecvRow()
- rename to PQrecvRow() and additionally provide PQgetRow()
- Don't bother, let users implement it themselves via callback API.

Comments?

--
marko

In response to

Re: Speed dblink using alternate libpq tuple storage at 2012-02-13 23:39:06 from Marko Kreen

Responses

Re: Speed dblink using alternate libpq tuple storage at 2012-02-26 22:19:22 from Marko Kreen

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Tom Lane	2012-02-24 16:10:16	Re: incompatible pointer types with newer zlib
Previous Message	Peter Geoghegan	2012-02-24 14:43:14	Re: Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)