Re: WIP: Collecting statistics on CSV file data

From: Etsuro Fujita <fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp>
To: Shigeru Hanada <shigeru(dot)hanada(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: WIP: Collecting statistics on CSV file data
Date: 2012-02-17 07:50:50
Message-ID: 4F3E06DA.5060108@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Hanada-san,

Sorry for the late response.

(2012/02/10 22:05), Shigeru Hanada wrote:
> (2011/12/15 11:30), Etsuro Fujita wrote:
>> (2011/12/14 15:34), Shigeru Hanada wrote:
>>> I think this patch could be marked as "Ready for committer" with some
>>> minor fixes. Please find attached a revised patch (v6.1).
>
> I've tried to make pgsql_fdw work with this feature, and found that few
> static functions to be needed to exported to implement ANALYZE handler
> in short-cut style. The "Short-cut style" means the way to generate
> statistics (pg_class and pg_statistic) for foreign tables without
> retrieving sample data from foreign server.

That's great! Here is my review.

The patch applies with some modifications and compiles cleanly. But
regression tests on subqueries failed in addition to role related tests
as discussed earlier.

While I've not looked at the patch in detail, I have some comments:

1. The patch might need codes to handle the irregular case where
ANALYZE-related catalog data such as attstattarget are different between
the local and the remote. (Although we don't have the options to set
such a data on a foreign table in ALTER FOREIGN TABLE.) For example,
while attstattarget = -1 for some column on the local, attstattarget = 0
for that column on the remote meaning that there can be no stats
available for that column. In such a case it would be better to inform
the user of it.

2. It might be better for the FDW to estimate the costs of a remote
query for itself without doing EXPLAIN if stats were available using
this feature. While this approach is less accurate compared to the
EXPLAIN approach due to the lack of information such as seq_page_cost or
randam_page_cost on the remote, it is cheaper! I think such a
information may be added to generic options for a foreign table, which
may have been previously discussed.

3.
> In implementing ANALYZE handler, hardest part was copying anyarray
> values from remote to local. If we can make it common in core, it would
> help FDW authors who want to implement ANALYZE handler without
> retrieving sample rows from remote server.

+1 from me.

Best regards,
Etsuro Fujita

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Guillaume Lelarge 2012-02-17 08:42:07 Re: Bug in intarray?
Previous Message Heikki Linnakangas 2012-02-17 07:45:46 Re: MySQL search query is not executing in Postgres DB