Re: Ideas about a better API for postgres_fdw remote estimates

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: "Andrey V(dot) Lepikhov" <a(dot)lepikhov(at)postgrespro(dot)ru>
Cc: Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Stephen Frost <sfrost(at)snowman(dot)net>, Bruce Momjian <bruce(at)momjian(dot)us>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Etsuro Fujita <etsuro(dot)fujita(at)gmail(dot)com>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Ashutosh Bapat <ashutosh(dot)bapat(at)2ndquadrant(dot)com>
Subject: Re: Ideas about a better API for postgres_fdw remote estimates
Date: 2020-09-04 14:57:51
Message-ID: 20200904145751.5zca73ihdfgh3eec@development
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Sep 03, 2020 at 10:14:41AM +0500, Andrey V. Lepikhov wrote:
>On 8/31/20 6:19 PM, Ashutosh Bapat wrote:
>>On Mon, Aug 31, 2020 at 3:36 PM Andrey V. Lepikhov
>><a(dot)lepikhov(at)postgrespro(dot)ru> wrote:
>>>
>>>Thanks for this helpful feedback.
>>I think the patch has some other problems like it works only for
>>regular tables on foreign server but a foreign table can be pointing
>>to any relation like a materialized view, partitioned table or a
>>foreign table on the foreign server all of which have statistics
>>associated with them. I didn't look closely but it does not consider
>>that the foreign table may not have all the columns from the relation
>>on the foreign server or may have different names. But I think those
>>problems are kind of secondary. We have to agree on the design first.
>>
>In accordance with discussion i made some changes in the patch:
>1. The extract statistic routine moved into the core.
>2. Serialized stat contains 'version' field to indicate format of
>statistic received.
>3. ANALYZE and VACUUM ANALYZE uses this approach only in the case of
>implicit analysis of the relation.
>
>I am currently keeping limitation of using the approach for regular
>relations only, because i haven't studied the specifics of another
>types of relations.
>But I don't know any reason to keep this limit in the future.
>
>The patch in attachment is very raw. I publish for further substantive
>discussion.
>

Thanks for working on this. I briefly looked at the patch today, and I
have some comments/feedback:

1) I wonder why deparseGetStatSql looks so different from e.g.
deparseAnalyzeSizeSql - no deparseStringLiteral on relname, no cast to
pg_catalog.regclass, function name not qualified with pg_catalog, ...

2) I'm a bit annoyed by the amount of code added to analyze.c only to
support output/input in JSON format. I'm no expert, but I don't recall
explain needing this much new stuff (OTOH it just produces json, it does
not need to read it). Maybe we also need to process wider range of data
types here. But the code is almost perfectly undocumented :-(

3) Why do we need to change vacuum_rel this way?

4) I wonder if we actually want/need to simply output pg_statistic data
verbatim like this. Is postgres_fdw actually going to benefit from it? I
kinda doubt that, and my assumption was that we'd return only a small
subset of the data, needed by get_remote_estimate.

This has a couple of issues. Firstly, it requires the knowledge of what
the stakind constants in pg_statistic mean and how to interpret it - but
OK, it's true that does not change very often (or at all). Secondly, it
entirely ignores extended statistics - OK, we might extract those too,
but it's going to be much more complex. And finally it entirely ignores
costing on the remote node. Surely we can't just apply local
random_page_cost or whatever, because those may be entirely different.
And we don't know if the remote is going to use index etc.

So is extracting data from pg_statistic the right approach?

5) I doubt it's enough to support relnames - we also need to estimate
joins, so this needs to support plain queries I think. At least that's
what Tom envisioned in his postgres_fdw_support(query text) proposal.

6) I see you've included a version number in the data - why not to just
check

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2020-09-04 15:00:31 Re: Allow continuations in "pg_hba.conf" files
Previous Message Alvaro Herrera 2020-09-04 14:41:40 Re: Change a constraint's index - ALTER TABLE ... ALTER CONSTRAINT ... USING INDEX ...