Re: Ideas about a better API for postgres_fdw remote estimates

From: Ashutosh Bapat <ashutosh(dot)bapat(at)2ndquadrant(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc: "Andrey V(dot) Lepikhov" <a(dot)lepikhov(at)postgrespro(dot)ru>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Stephen Frost <sfrost(at)snowman(dot)net>, Bruce Momjian <bruce(at)momjian(dot)us>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Etsuro Fujita <etsuro(dot)fujita(at)gmail(dot)com>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Subject: Re: Ideas about a better API for postgres_fdw remote estimates
Date: 2020-09-09 04:36:17
Message-ID: CAG-ACPXUnC2WBBH=MfVZkhW98eoP6f_yRN-Y-SoWy7HTrh8giw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, 9 Sep 2020 at 02:35, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
wrote

>
> I think that was the topic of *this* thread as started by Tom, but I now
> realize Andrey steered it in the direction to allow re-using remote
> stats. Which seems useful too, but it confused me a bit.
>

I didn't realize that the nearby thread I am mentioning is actually this
thread :). Sorry.

>
> >Using EXPLAIN to get costs from the foreign server isn't efficient. It
> >increases planning time a lot; sometimes planning time exceeds execution
> >time. If usage of foreign tables becomes more and more common, this isn't
> >ideal. I think we should move towards a model in which the optimizer can
> >decide whether a subtree involving a foreign server should be evaluated
> >locally or on the foreign server without the help of foreign server. One
> >way to do it (I am not saying that this is the only or the best way) is to
> >estimate the cost of foreign query locally based on the information
> >available locally about the foreign server and foreign table. This might
> >mean that we have to get that information from the foreign server and
> cache
> >it locally and use it several times, including the indexes on foreign
> >table, values of various costs etc. Though this approach doesn't solve all
> >of those problems it's one step forward + it makes the current scenario
> >also efficient.
> >
>
> True, but that ptoject is way more ambitious than providing a simple API
> for postgres_fdw to obtain the estimates more efficiently.
>

Doing all of that is a big project. But what this patch aims at is a small
subset which makes statistics collection efficient and automatic. So, just
for that, we should consider it.

>
> >I agree that the patch needs some work though, esp the code dealing with
> >serialization and deserialization of statistics.
>
> I think there's a bunch of open questions, e.g. what to do with extended
> statistics - for example what should happen when the extended statistics
> object is defined only on local/remote server, or when the definitions
> don't match? What should happen when the definitions don't match? This
> probably is not an issue for "regular" stats, because that seems pretty
> stable, but for extended stats there are differences between versions.

If it is defined on the foreign server but not the local server, there is
no need to fetch it from the foreign server. The other way round case is
tricky. We could mark the extended statistics object invalid if it's not
defined on the foreign server or the definition is different. We have to
document it that way. I think that should serve most of the cases.

--
Best Wishes,
Ashutosh

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2020-09-09 04:50:40 Re: INSERT INTO SELECT, Why Parallelism is not selected?
Previous Message Andres Freund 2020-09-09 04:11:47 Re: VACUUM (INTERRUPTIBLE)?