Re: Speed dblink using alternate libpq tuple storage

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Marko Kreen <markokr(at)gmail(dot)com>
Cc: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)oss(dot)ntt(dot)co(dot)jp>, greg(at)2ndquadrant(dot)com, pgsql-hackers(at)postgresql(dot)org, mmoncure(at)gmail(dot)com, shigeru(dot)hanada(at)gmail(dot)com
Subject: Re: Speed dblink using alternate libpq tuple storage
Date: 2012-04-04 14:46:28
Message-ID: 25642.1333550788@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Marko Kreen <markokr(at)gmail(dot)com> writes:
> On Tue, Apr 03, 2012 at 05:32:25PM -0400, Tom Lane wrote:
>> Well, there are really four levels to the API design:
>> * Plain old PQexec.
>> * Break down PQexec into PQsendQuery and PQgetResult.
>> * Avoid waiting in PQgetResult by testing PQisBusy.
>> * Avoid waiting in PQsendQuery (ie, avoid the risk of blocking
>> on socket writes) by using PQisnonblocking.

> Thats actually nice overview. I think our basic disagreement comes
> from how we map the early-exit into those modes.
> I want to think of the early-exit row-processing as 5th and 6th modes:

> * Row-by-row processing on sync connection (PQsendQuery() + ???)
> * Row-by-row processing on async connection (PQsendQuery() + ???)

> But instead you want work with almost no changes on existing modes.

Well, the trouble with the proposed PQgetRow/PQrecvRow is that they only
work safely at the second API level. They're completely unsafe to use
with PQisBusy, and I think that is a show-stopper. In your own terms,
the "6th mode" doesn't work.

More generally, it's not very safe to change the row processor while a
query is in progress. PQskipResult can get away with doing so, but only
because the entire point of that function is to lose data, and we don't
much care whether some rows already got handled differently. For every
other use-case, you have to set up the row processor in advance and
leave it in place, which is a guideline that PQgetRow/PQrecvRow violate.

So I think the only way to use row-by-row processing is to permanently
install a row processor that normally returns zero. It's possible that
we could provide a predefined row processor that acts that way and
invite people to install it. However, I think it's premature to suppose
that we know all the details of how somebody might want to use this.
In particular the notion of cloning the PGresult for each row seems
expensive and not obviously more useful than direct access to the
network buffer. So I'd rather leave it as-is and see if any common
usage patterns arise, then add support for those patterns.

>> In particular, I flat out will not accept a design in which that option
>> doesn't work unless the current call came via PQisBusy, much less some
>> entirely new call like PQhasRowOrResult. It's unusably fragile (ie,
>> timing sensitive) if that has to be true.

> Agreed for PQisBusy, but why is PQhasRowOrResult() fragile?

Because it breaks if you use PQisBusy *anywhere* in the application.
That's not just a bug hazard but a loss of functionality. I think it's
important to have a pure "is data available" state test function that
doesn't cause data to be consumed from the connection, and there's no
way to have that if there are API functions that change the row
processor setting mid-query. (Another way to say this is that PQisBusy
ought to be idempotent from the standpoint of the application --- we
know that it does perform work inside libpq, but it doesn't change the
state of the connection so far as the app can tell, and so it doesn't
matter if you call it zero, one, or N times between other calls.)

regards, tom lane

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2012-04-04 15:00:41 Re: BUG #6572: The example of SPI_execute is bogus
Previous Message Boszormenyi Zoltan 2012-04-04 14:22:29 Re: [PATCH] lock_timeout and common SIGALRM framework