Re: Question about optimising (Postgres_)FDW

From: Hannu Krosing <hannu(at)2ndQuadrant(dot)com>
To: PostgreSQL-development <pgsql-hackers(at)postgreSQL(dot)org>
Cc: shigeru(dot)hanada(at)gmail(dot)com, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: Question about optimising (Postgres_)FDW
Date: 2014-04-16 04:34:56
Message-ID: 534E0870.3050309@2ndQuadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 04/16/2014 06:12 AM, Hannu Krosing wrote:
> On 04/16/2014 01:25 AM, Tom Lane wrote:
>> Hannu Krosing <hannu(at)2ndQuadrant(dot)com> writes:
>>> Is there a way to force it to prefer a plan where the results of (select
>>> id from onemillion where data > '0.9' limit 100)
>>> are passed to FDW as a single IN ( = ANY(...)) query and are retrieved
>>> all at once ?
>> You could write the query like that:
>>
>> select * from onemillion_pgfdw where id = any (array(select id from
>> onemillion where data > '0.9' limit 100));
> My actual use-case was about a join between a local and a remote table
> and without rewriting the query (they come from ORM)
>
> I was hoping to be able to nudge postgresql towards a better plan via some
> tuning of table/fdw options or GUCs.
>
> for example, would postgresql use the WHERE id IN (...) query on remote
> side for a query like
>
> select r.data, l.data
> from onemillion_pgfdw r
> join onemillion l
> on r.id = l.id and l.data > '0.999';
>
> if it recognizes that the local side returns only 1000 rows ?
>
> or would it still use 1000 individual WHERE id = $1 queries.
>
> Is getting the foreign data via IN and then turning the data into a hash
> for joining one of the plans it considers at all ?
It sees that could we need an extra tuning parameter for choosing the

ID IN (...) + HASH plan over individual SELECT .. WHERE ID = $1

something between

`fdw_startup_cost` and `fdw_tuple_cost`

to signify that an IN query returning 1000 rows runs faster than 1000 =
queries

as I understan currently they both would be estimated as

fdw_startup_cost + 1000 * fdw_tuple_cost

the new parameter could be fdw_call_cost or fdw_query_cost and would
estimate
how much each individual call to fdw costs, thus favouring calls which
return more
data in one call

Cheers
Hannu
>
> Best
> Hannu
>
>> Or at least you should be able to, except when I try it I get
>>
>> explain analyze
>> select * from onemillion_pgfdw where id = any (array(select id from
>> onemillion where data > '0.9' limit 100));
>> ERROR: operator does not exist: integer = integer[]
>> HINT: No operator matches the given name and argument type(s). You might need to add explicit type casts.
>> CONTEXT: Remote SQL command: EXPLAIN SELECT id, inserted, data FROM public.onemillion WHERE ((id = ANY ((SELECT null::integer[]))))
>>
>> so there's something the remote-estimate code is getting wrong here.
>> (It seems to work without remote_estimate, though.)
>>
>> regards, tom lane
>

--
Hannu Krosing
PostgreSQL Consultant
Performance, Scalability and High Availability
2ndQuadrant Nordic OÜ

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2014-04-16 04:44:09 Re: Clock sweep not caching enough B-Tree leaf pages?
Previous Message Amit Kapila 2014-04-16 04:27:10 Re: Clock sweep not caching enough B-Tree leaf pages?