postgres_fdw cost estimation defaults and documentation

From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: postgres_fdw cost estimation defaults and documentation
Date: 2017-06-05 18:37:09
Message-ID: CAMkU=1yYPsTxrne8=NghrGeyg4WWbXUce_Z-y=9vGW5G3na=2g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

The default value for fdw_tuple_cost is 0.01, which seems way too low. If
I set up a loop-back foreign server with a large fetch_size, then tests
like:

select * from pgbench_accounts except select * from
loopback.pgbench_accounts

vs

select * from pgbench_accounts except select * from pgbench_accounts

indicate that 0.1 is about the lowest value for fdw_tuple_cost that could
make sense, and a reasonable default would probably be 0.25. Yes, it is
only a default, but the default should make sense for at least some
situation, and I can't imagine any situation in which 0.01 makes sense.

In the documentation for fdw_startup_cost, it says "This represents the
additional overhead of establishing a connection, parsing and planning the
query on the remote side, etc.". I think that "establishing a connection"
should be stricken. Either you need a connection or you don't, there is
nothing the optimizer can do about this. And if do need one, you only
establish one once (at most), not once per query sent to the remote side.
I think the implementation correctly doesn't try to account for the
overhead of establishing a connection, so the docs should remove that claim.

In regards to use_remote_estimate, the documentation says "Running ANALYZE
on the foreign table is the way to update the local statistics; this will
perform a scan of the remote table and then calculate and store statistics
just as though the table were local. Keeping local statistics can be a
useful way to reduce per-query planning overhead for a remote table — but
if the remote table is frequently updated, the local statistics will soon
be obsolete." This makes it send like local stats is basically equivalent
to use_remote_estimate, other than the staleness issue. But they are far
from equivalent. use_remote_estimate has implicit knowledge of the indexes
on the foreign server (implicit via the reduced cost estimates derived from
the foreign side for parameterized queries), whereas local stats of foreign
tables just assumes there are no indexes for planning purposes. Perhaps
adding something like "Also, local statistics do not contain information on
the available indexes on the remote side, while use_remote_estimate does
take these into account"?

Cheers,

Jeff

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2017-06-05 18:57:10 Re: shm_toc_lookup API
Previous Message Shubham Barai 2017-06-05 17:40:35 GSoC 2017 weekly progress reports ("Explicitly support predicate locks in index access methods besides b-tree")