Re: Speed dblink using alternate libpq tuple storage

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Marko Kreen <markokr(at)gmail(dot)com>
Cc: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)oss(dot)ntt(dot)co(dot)jp>, greg(at)2ndquadrant(dot)com, pgsql-hackers(at)postgresql(dot)org, mmoncure(at)gmail(dot)com, shigeru(dot)hanada(at)gmail(dot)com
Subject: Re: Speed dblink using alternate libpq tuple storage
Date: 2012-03-31 15:50:27
Message-ID: 20234.1333209027@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Marko Kreen <markokr(at)gmail(dot)com> writes:
> On Thu, Mar 29, 2012 at 06:56:30PM -0400, Tom Lane wrote:
>> Yeah. Perhaps we should tweak the row-processor callback API so that
>> it gets an explicit notification that "this is a new resultset".
>> Duplicating PQexec's behavior would then involve having the dblink row
>> processor throw away any existing tuplestore and start over when it
>> gets such a call.
>>
>> There's multiple ways to express that but the most convenient thing
>> from libpq's viewpoint, I think, is to have a callback that occurs
>> immediately after collecting a RowDescription message, before any
>> rows have arrived. So maybe we could express that as a callback
>> with valid "res" but "columns" set to NULL?
>>
>> A different approach would be to add a row counter to the arguments
>> provided to the row processor; then you'd know a new resultset had
>> started if you saw rowcounter == 0. This might have another advantage
>> of not requiring the row processor to count the rows for itself, which
>> I think many row processors would otherwise have to do.

I had been leaning towards the second approach with a row counter,
because it seemed cleaner, but I thought of another consideration that
makes the first way seem better. Suppose that your row processor has to
do some setup work at the start of a result set, and that work needs to
see the resultset properties (eg number of columns) so it can't be done
before starting PQgetResult. In the patch as submitted, the
only way to manage that is to keep enough state to recognize that the
current row processor call is the first one, which we realized is
inadequate for multiple-result-set cases. With a row counter argument
you can do the setup whenever rowcount == 0, which fixes that. But
neither of these methods deals correctly with an empty result set!
To make that work, you need to add extra logic after the PQgetResult
call to do the setup work the row processor should have done but never
got a chance to. So that's ugly, and it makes for an easy-to-miss bug.
A call that occurs when we receive RowDescription, independently of
whether the result set contains any rows, makes this a lot cleaner.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2012-03-31 15:59:47 Re: [COMMITTERS] pgsql: Add PGDLLIMPORT to ScanKeywords and NumScanKeywords.
Previous Message Andrew Dunstan 2012-03-31 15:42:47 Re: narwhal versus gnu_printf