Re: Question about optimising (Postgres_)FDW

From: Hannu Krosing <hannu(at)krosing(dot)net>
To: Etsuro Fujita <fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp>, pgsql-hackers(at)postgresql(dot)org, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: Question about optimising (Postgres_)FDW
Date: 2014-04-16 13:16:40
Message-ID: 534E82B8.9070205@krosing.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 04/16/2014 01:35 PM, Etsuro Fujita wrote:
> (2014/04/16 6:55), Hannu Krosing wrote:
...
>
> Maybe I'm missing something, but I think that you can do what I think
> you'd like to do by the following procedure:
No, what I'd like PostgreSQL to do is to

1. select the id+set from local table
2. select the rows from remote table with WHERE ID IN (<set selected in
step 1>)
3. then join the original set to selected set, with any suitable join
strategy

The things I do not want are

A. selecting all rows from remote table
(this is what your examples below do)

or

B. selecting rows from remote table by single selects using "ID = $"
(this is something that I managed to do by some tweaking of costs)

as A will be always slow if there are millions of rows in remote table
and B is slow(ish) when the idset is over a few hundred ids

I hope this is a bit better explanation than I provided before .

Cheers
Hannu

P.S. I am not sure if this is a limitation of postgres_fdw or postgres
itself

P.P.S I tested a little with with Multicorn an postgresql did not
request row
counts for any IN plans, so it may be that the planner does not consider
this
kind of plan at all. (testing was on PgSQL 9.3.4)

Hannu
>
> postgres=# ALTER SERVER loop OPTIONS (ADD fdw_startup_cost '1000');
> ALTER SERVER
> postgres=# EXPLAIN VERBOSE SELECT * FROM onemillion_pgsql WHERE id in
> (SELECT id FROM onemillion WHERE data > '0.9' LIMIT 100);
> QUERY PLAN
> -----------------------------------------------------------------------------------------------
>
> Hash Semi Join (cost=1023.10..41983.21 rows=100 width=30)
> Output: onemillion_pgsql.id, onemillion_pgsql.inserted,
> onemillion_pgsql.data
> Hash Cond: (onemillion_pgsql.id = onemillion.id)
> -> Foreign Scan on public.onemillion_pgsql
> (cost=1000.00..39334.00 rows=1000000 width=29)
> Output: onemillion_pgsql.id, onemillion_pgsql.inserted,
> onemillion_pgsql.data
> Remote SQL: SELECT id, inserted, data FROM public.onemillion
> -> Hash (cost=21.85..21.85 rows=100 width=4)
> Output: onemillion.id
> -> Limit (cost=0.00..20.85 rows=100 width=4)
> Output: onemillion.id
> -> Seq Scan on public.onemillion (cost=0.00..20834.00
> rows=99918 width=4)
> Output: onemillion.id
> Filter: (onemillion.data > '0.9'::text)
> Planning time: 0.690 ms
> (14 rows)
>
> or, that as Tom mentioned, by disabling the use_remote_estimate function:
>
> postgres=# ALTER FOREIGN TABLE onemillion_pgsql OPTIONS (SET
> use_remote_estimate 'false');
> ALTER FOREIGN TABLE
> postgres=# EXPLAIN VERBOSE SELECT * FROM onemillion_pgsql WHERE id in
> (SELECT id FROM onemillion WHERE data > '0.9' LIMIT 100);
> QUERY PLAN
> ----------------------------------------------------------------------------------------------
>
> Hash Semi Join (cost=123.10..41083.21 rows=100 width=30)
> Output: onemillion_pgsql.id, onemillion_pgsql.inserted,
> onemillion_pgsql.data
> Hash Cond: (onemillion_pgsql.id = onemillion.id)
> -> Foreign Scan on public.onemillion_pgsql (cost=100.00..38434.00
> rows=1000000 width=30)
> Output: onemillion_pgsql.id, onemillion_pgsql.inserted,
> onemillion_pgsql.data
> Remote SQL: SELECT id, inserted, data FROM public.onemillion
> -> Hash (cost=21.85..21.85 rows=100 width=4)
> Output: onemillion.id
> -> Limit (cost=0.00..20.85 rows=100 width=4)
> Output: onemillion.id
> -> Seq Scan on public.onemillion (cost=0.00..20834.00
> rows=99918 width=4)
> Output: onemillion.id
> Filter: (onemillion.data > '0.9'::text)
> Planning time: 0.215 ms
> (14 rows)
>
> Thanks,
>
> Best regards,
> Etsuro Fujita
>
>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Merlin Moncure 2014-04-16 13:25:23 Re: Clock sweep not caching enough B-Tree leaf pages?
Previous Message Andres Freund 2014-04-16 13:14:40 Re: Clock sweep not caching enough B-Tree leaf pages?