Re: Statistics Import and Export

From: Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>
To: Corey Huinker <corey(dot)huinker(at)gmail(dot)com>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Statistics Import and Export
Date: 2023-11-07 09:23:54
Message-ID: CAExHW5uc7+Ed2J3jv1vQu063pSC7S7uP+HycCcPtrOZVLFYg1g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Oct 31, 2023 at 12:55 PM Corey Huinker <corey(dot)huinker(at)gmail(dot)com> wrote:
>>
>>
>> Yeah, that use makes sense as well, and if so then postgres_fdw would likely need to be aware of the appropriate query for several versions back - they change, not by much, but they do change. So now we'd have each query text in three places: a system view, postgres_fdw, and the bin/scripts pre-upgrade program. So I probably should consider the best way to share those in the codebase.
>>
>
> Attached is v2 of this patch.
>
> New features:
> * imports index statistics. This is not strictly accurate: it re-computes index statistics the same as ANALYZE does, which is to say it derives those stats entirely from table column stats, which are imported, so in that sense we're getting index stats without touching the heap.
> * now support extended statistics except for MCV, which is currently serialized as an difficult-to-decompose bytea field.
> * bare-bones CLI script pg_export_stats, which extracts stats on databases back to v12 (tested) and could work back to v10.
> * bare-bones CLI script pg_import_stats, which obviously only works on current devel dbs, but can take exports from older versions.
>

I did a small experiment with your patches. In a separate database
"fdw_dst" I created a table t1 and populated it with 100K rows
#create table t1 (a int, b int);
#insert into t1 select i, i + 1 from generate_series(1, 100000) i;
#analyse t1;

In database "postgres" on the same server, I created a foreign table
pointing to t1
#create server fdw_dst_server foreign data wrapper postgres_fdw
OPTIONS ( dbname 'fdw_dst', port '5432');
#create user mapping for public server fdw_dst_server ;
#create foreign table t1 (a int, b int) server fdw_dst_server;

The estimates are off
#explain select * from t1 where a = 100;
QUERY PLAN
-----------------------------------------------------------
Foreign Scan on t1 (cost=100.00..142.26 rows=13 width=8)
(1 row)

Export and import stats for table t1
$ pg_export_stats -d fdw_dst | pg_import_stats -d postgres

gives accurate estimates
#explain select * from t1 where a = 100;
QUERY PLAN
-----------------------------------------------------------
Foreign Scan on t1 (cost=100.00..1793.02 rows=1 width=8)
(1 row)

In this simple case it's working like a charm.

Then I wanted to replace all ANALYZE commands in postgres_fdw.sql with
import and export of statistics. But I can not do that since it
requires table names to match. Foreign table metadata stores the
mapping between local and remote table as well as column names. Import
can use that mapping to install the statistics appropriately. We may
want to support a command or function in postgres_fdw to import
statistics of all the tables that point to a given foreign server.
That may be some future work based on your current patches.

I have not looked at the code though.

--
Best Wishes,
Ashutosh Bapat

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2023-11-07 09:33:03 Re: GUC names in messages
Previous Message Jelte Fennema-Nio 2023-11-07 09:23:12 Re: Add PQsendSyncMessage() to libpq