Re: Ideas about a better API for postgres_fdw remote estimates

From: Stephen Frost <sfrost(at)snowman(dot)net>
To: Andrey Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)lists(dot)postgresql(dot)org, Etsuro Fujita <etsuro(dot)fujita(at)gmail(dot)com>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Subject: Re: Ideas about a better API for postgres_fdw remote estimates
Date: 2020-08-29 16:22:31
Message-ID: 20200829162231.GE29590@tamriel.snowman.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Greetings,

* Andrey Lepikhov (a(dot)lepikhov(at)postgrespro(dot)ru) wrote:
> During the implementation of sharding related improvements i noticed that if
> we use a lot of foreign partitions, we have bad plans because of vacuum
> don't update statistics of foreign tables.This is done by the ANALYZE
> command, but it is very expensive operation for foreign table.
> Problem with statistics demonstrates with TAP-test from the first patch in
> attachment.

Yes, the way we handle ANALYZE today for FDWs is pretty terrible, since
we stream the entire table across to do it.

> I implemented some FDW + pg core machinery to reduce weight of the problem.
> The ANALYZE command on foreign table executes query on foreign server that
> extracts statistics tuple, serializes it into json-formatted string and
> returns to the caller. The caller deserializes this string, generates
> statistics for this foreign table and update it. The second patch is a
> proof-of-concept.

Isn't this going to create a version dependency that we'll need to deal
with..? What if a newer major version has some kind of improved ANALYZE
command, in terms of what it looks at or stores, and it's talking to an
older server?

When I was considering the issue with ANALYZE and FDWs, I had been
thinking it'd make sense to just change the query that's built in
deparseAnalyzeSql() to have a TABLESAMPLE clause, but otherwise run in
more-or-less the same manner as today. If we don't like the available
TABLESAMPLE methods then we could add a new one that's explicitly the
'right' sample for an ANALYZE call and use that when it's available on
the remote side. Not sure if it'd make any sense for ANALYZE itself to
start using that same TABLESAMPLE code, but maybe? Not that I think
it'd be much of an issue if it's independent either, with appropriate
comments to note that we should probably try to make them match up for
the sake of FDWs.

> This patch speedup analyze command and provides statistics relevance on a
> foreign table after autovacuum operation. Its effectiveness depends on
> relevance of statistics on the remote server, but still.

If we do decide to go down this route, wouldn't it mean we'd have to
solve the problem of what to do when it's a 9.6 foreign server being
queried from a v12 server and dealing with any difference in the
statistics structures of the two?

Seems like we would... in which case I would say that we should pull
that bit out and make it general, and use it for pg_upgrade too, which
would benefit a great deal from having the ability to upgrade stats
between major versions also. That's a much bigger piece to take on, of
course, but seems to be what's implied with this approach for the FDW.

Thanks,

Stephen

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2020-08-29 16:36:42 Re: Clang UndefinedBehaviorSanitize (Postgres14) Detected undefined-behavior
Previous Message Tom Lane 2020-08-29 16:17:54 Re: More aggressive vacuuming of temporary tables