Re: foreign data wrappers

From: Zheng Yang <zhengyang4k(at)gmail(dot)com>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Guillaume Lelarge <guillaume(at)lelarge(dot)info>, Shiv <rama(dot)theone(at)gmail(dot)com>, Selena Deckelmann <selena(at)chesnok(dot)com>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, pgsql-students <pgsql-students(at)postgresql(dot)org>
Subject: Re: foreign data wrappers
Date: 2011-03-30 03:00:24
Message-ID: E1C78427-1287-46D8-8E66-8DC56E234F9A@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-students


Hi Andrew and Guillaume,

>
>
> On 03/29/2011 12:35 PM, Guillaume Lelarge wrote:
>> Le 29/03/2011 18:32, Andrew Dunstan a écrit :
>>>
>>> On 03/29/2011 11:48 AM, Guillaume Lelarge wrote:
>>>> Le 29/03/2011 13:28, Zheng Yang a écrit :
>>>>> Hi guys,
>>>>>
>>>>>>> I've briefly gone through the slides. Regarding the 6 callbacks, is
>>>>>>> that correct to say that a full table scan will always be performed
>>>>>>> irregardless of the sql statement,
>>>>>>> the FDW is blind to the sql query performed, right?
>>>>>> Yes, fairly much. If the feed is large you need some way to pass a
>>>>>> limit to the foreign side, possibly via table options. I'm fairly
>>>>>> sure you won't be able to get it via the SELECT statement.
>>>>>>
>>>>> Regarding the previous flickr example, I'm wondering how this 'free
>>>>> text search' function can be done if the FDW is blind to the SELECT
>>>>> statement.
>>>>>
>>>>> For instance, the following query is to retrieve a photo relevant to
>>>>> 'panda':
>>>>>
>>>>> SELECT photo FROM flickr_table WHERE search LIKE '%panda%';
>>>>>
>>>>> In this case, the FDW can only open a connection to flickr web
>>>>> service and return the next 'row' .
>>>>> The problem is that there are a huge number of photos in flickr
>>>>> server and retrieving them sequentially is not realistic.
>>>>> Any ideas on how this can be done?
>>>>>
>>>> It probably means that flickr is not a good example of a nice fdw.
>>>
>>> Neither of you are being very creative. As I mentioned above, you need
>>> to embed this sort of stuff in table options.
>>>
>>> so you would have something like:
>>>
>>> create foreign table panda_flickr (photo bytea, ...)
>>> server flickr_server
>>> options (searchterm 'panda', maxrows '50');
>>> select photo from panda_flickr;
>>>
>> This would work but means you need to create a new foreign table to
>> search something else.
>>
>> So, yeah, it works, but it's not convenient.
>
> The other possibility is that you can dig down into the ForiegnScanState object. The FDW routines are passed a ForeignScanState object which contains a ScanState object which in turn contains a PlanState object which has a list of quals. You probably need to dig quite a bit further but that's a start.

I think this is a common issue for all FDWs that need to access remote resources over a network. For example, if there were a Mysql FDW, a full table scan implies the whole table will be transferred over.
it is not quite efficient for large tables.

If a table size is 1GB, iterating the whole table row by row means those 1GB of data needs to be transferred over. This may take hours even if for an sql statement as simple as
SELECT * from table where id = 1;

cheers,
ZY

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Viliam Ďurina 2011-03-30 03:45:01 Re: Problem calling setweight function from JDBC
Previous Message Worgan, Craig (Craig) 2011-03-30 01:35:29 Re: RPM for ODBC driver

Browse pgsql-students by date

  From Date Subject
Next Message Tomáš Pospíšil 2011-03-30 23:02:30 GSoC 2011 - indexing XML data
Previous Message Andrew Dunstan 2011-03-29 16:51:53 Re: foreign data wrappers