Re: Speed dblink using alternate libpq tuple storage

From: Marko Kreen <markokr(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)oss(dot)ntt(dot)co(dot)jp>, greg(at)2ndquadrant(dot)com, pgsql-hackers(at)postgresql(dot)org, mmoncure(at)gmail(dot)com, shigeru(dot)hanada(at)gmail(dot)com
Subject: Re: Speed dblink using alternate libpq tuple storage
Date: 2012-03-30 15:52:47
Message-ID: 20120330155247.GA30010@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Mar 29, 2012 at 06:56:30PM -0400, Tom Lane wrote:
> Marko Kreen <markokr(at)gmail(dot)com> writes:
> > My conclusion is that row-processor API is low-level expert API and
> > quite easy to misuse. It would be preferable to have something more
> > robust as end-user API, the PQgetRow() is my suggestion for that.
> > Thus I see 3 choices:
>
> > 1) Push row-processor as main API anyway and describe all dangerous
> > scenarios in documentation.
> > 2) Have both PQgetRow() and row-processor available in <libpq-fe.h>,
> > PQgetRow() as preferred API and row-processor for expert usage,
> > with proper documentation what works and what does not.
> > 3) Have PQgetRow() in <libpq-fe.h>, move row-processor to <libpq-int.h>.
>
> I still am failing to see the use-case for PQgetRow. ISTM the entire
> point of a special row processor is to reduce the per-row processing
> overhead, but PQgetRow greatly increases that overhead.

No, decreasing CPU overhead is minor win. I guess in realistic
application, like dblink, you can't even measure the difference.

The *major* win comes from avoiding buffering of all rows in PGresult.
Ofcourse, this is noticeable only with bigger resultsets.
I guess such buffering pessimizes memory usage: code always
works on cold cache. And simply keeping RSS low is good for
long-term health of a process.

Second major win is avoiding the need to use cursor with small chunks
to access resultset of unknown size. Thus stalling application
until next block arrives from network.

The "PGresult *PQgetRow()" is for applications that do not convert
rows immediately to some internal format, but keep using PGresult.
So they can be converted to row-by-row processing with minimal
changes to actual code.

Note that the PGrowValue is temporary struct that application *must*
move data away from. If app internally uses PGresult, then it's
pretty annoying to invent a new internal format for long-term
storage.

But maybe I'm overestimating the number of such applications.

> And it doesn't
> reduce complexity much either IMO: you still have all the primary risk
> factors arising from processing rows in advance of being sure that the
> whole query completed successfully.

It avoids the complexity of:

* How to pass error from callback to upper code

* Needing to know how exceptions behave

* How to use early exit to pass rows to upper code one-by-one,
(by storing the PGresult and PGrowValue in temp place
and later checking their values)

* How to detect that new resultset has started. (keeping track
of previous PGresult or noting some quirky API behaviour
we may invent for such case)

* Needing to make sure the callback does not leak to call-sites
that expect regular libpq behaviour.
("Always call PQregisterRowProcessor(db, NULL, NULL) after query finishes" )
["But now I'm in exception handler, how do I find the connection?"]

I've now reviewed the callback code and even done some coding with it
and IMHO it's too low-level to be end-user-API.

Yes, the "query-may-still-fail" complexity remains, but thats not unlike
the usual "multi-statement-transaction-is-not-guaranteed-to-succeed"
complexity.

Another compexity that remains is "how-to-skip-current-resultset",
but that is a problem only on sync connections and the answer is
simple - "call PQgetResult()". Or "call PQgetRow/PQrecvRow" if
user wants to avoid buffering.

> Plus it conflates "no more data"
> with "there was an error receiving the data" or "there was an error on
> the server side".

Well, current PQgetRow() is written with style: "return only single-row
PGresult, to see errors user must call PQgetResult()". Basically
so that user it forced to fall back familiar libpq usage pattern.

It can be changed, so that PQgetRow() returns also errors.

Or we can drop it and just keep PQrecvRow().

> PQrecvRow alleviates the per-row-overhead aspect of
> that but doesn't really do a thing from the complexity standpoint;
> it doesn't look to me to be noticeably easier to use than a row
> processor callback.

> I think PQgetRow and PQrecvRow just add more API calls without making
> any fundamental improvements, and so we could do without them. "There's
> more than one way to do it" is not necessarily a virtue.

Please re-read the above list of problematic situations that this API
fixes. Then, if you still think that PQrecvRow() is pointless, sure,
let's drop it.

We can also postpone it to 9.3, to poll users whether they want
easier API, or is maximum performance important. (PQrecvRow()
*does* have few cycles of overhead compared to callbacks.)

Only option that we have on the table for 9.2 but not later
is moving the callback API to <libpq-int.h>.

> > Second conclusion is that current dblink row-processor usage is broken
> > when user uses multiple SELECTs in SQL as dblink uses plain PQexec().
>
> Yeah. Perhaps we should tweak the row-processor callback API so that
> it gets an explicit notification that "this is a new resultset".
> Duplicating PQexec's behavior would then involve having the dblink row
> processor throw away any existing tuplestore and start over when it
> gets such a call.
>
> There's multiple ways to express that but the most convenient thing
> from libpq's viewpoint, I think, is to have a callback that occurs
> immediately after collecting a RowDescription message, before any
> rows have arrived. So maybe we could express that as a callback
> with valid "res" but "columns" set to NULL?
>
> A different approach would be to add a row counter to the arguments
> provided to the row processor; then you'd know a new resultset had
> started if you saw rowcounter == 0. This might have another advantage
> of not requiring the row processor to count the rows for itself, which
> I think many row processors would otherwise have to do.

Try to imagine how final documentation will look like.

Then imagine documentation for PGrecvRow() / PQgetRow().

--
marko

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2012-03-30 15:59:12 Re: Speed dblink using alternate libpq tuple storage
Previous Message Mike Roest 2012-03-30 15:51:58 pg_dump incredibly slow dumping a single schema from a large db