Re: WIP: Collecting statistics on CSV file data

From: Etsuro Fujita <fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp>
To: Shigeru Hanada <shigeru(dot)hanada(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: WIP: Collecting statistics on CSV file data
Date: 2011-11-18 07:25:55
Message-ID: 4EC60883.2050905@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

(2011/11/07 20:26), Shigeru Hanada wrote:
> (2011/10/20 18:56), Etsuro Fujita wrote:
>> I revised the patch according to Hanada-san's comments. Attached is the
>> updated version of the patch.
>>
>> Changes:
>>
>> * pull up of logging "analyzing foo.bar"
>> * new vac_update_relstats always called
>> * tab-completion in psql
>> * add "foreign tables are not analyzed automatically..." to 23.1.3
>> Updating Planner Statistics
>> * some other modifications
>
> Submission review
> =================
>
> - Patch can be applied, and all regression tests passed. :)

Thank you for your testing. I updated the patch according to your
comments. Attached is the updated version of the patch.

> - file_fdw_do_analyze_rel is almost copy of do_analyze_rel. IIUC,
> difference against do_analyze_rel are:
> * don't logging analyze target
> * don't switch userid to the owner of target table
> * don't measure elapsed time for autoanalyze deamon
> * don't handle index
> * some comments are removed.
> * sample rows are acquired by file_fdw's routine
>
> I don't see any problem here, but would you confirm that all of them are
> intentional?

Yes. But in the updated version, I've refactored analyze.c a little bit
to allow FDW authors to simply call do_analyze_rel().

> - In your design, each FDW have to copy most of do_analyze_rel to their
> own source. It means that FDW authors must know much details of ANALYZE
> to implement ANALYZE handler. Actually, your patch exports some static
> functions from analyze.c. Have you considered hooking
> acquire_sample_rows()? Such handler should be more simple, and
> FDW-specific. As you say, such design requires FDWs to skip some
> records, but it would be difficult for some FDW (e.g. twitter_fdw) which
> can't pick sample data up easily. IMHO such problem *must* be solved by
> FDW itself.

The updated version enables FDW authors to just write their own
acquire_sample_rows(). On the other hand, by retaining to hook
AnalyzeForeignTable routine at analyze_rel(), higher level than
acquire_sample_rows() as before, it allows FDW authors to write
AnalyzeForeignTable routine for foreign tables on a remote server to ask
the server for its current stats instead, as pointed out earlier by Tom
Lane.

Best regards,
Etsuro Fujita

Attachment Content-Type Size
postgresql-analyze-v4.patch text/plain 39.1 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jeff Davis 2011-11-18 08:14:16 Re: Are range_before and range_after commutator operators?
Previous Message Robert Haas 2011-11-18 05:20:26 Re: Inlining comparators as a performance optimisation