Re: WIP: Collecting statistics on CSV file data

From: Etsuro Fujita <fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp>
To: Shigeru Hanada <shigeru(dot)hanada(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: WIP: Collecting statistics on CSV file data
Date: 2011-11-21 03:55:49
Message-ID: 4EC9CBC5.3070500@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Hanada-san,

Thank you for your valuable comments. I will improve the items pointed
out by you at the next version of the patch, including documentation on
the purpose of AnalyzeForeignTable, how to write it, and so on. Here I
comment only one point:

> - Why file_fdw skips sample tuples which have NULL value? AFAIS
> original acquire_sample_rows doesn't do so.

To be precise, I've implemented to skip tuples that have null values in
certain column(s) but that are not allowed to contain null values in
that(those) column(s) by NOT NULL constrain. file_fdw's
sample_row_acquirer considers those tuples as "dead" tuples. This is
for the consistency with NOT NULL constrain. (But I don't know why
fileIterateForeignScan routine allows such dead tuples. I may have
missed something.)

Best regards,
Etsuro Fujita

(2011/11/18 21:00), Shigeru Hanada wrote:
> (2011/11/18 16:25), Etsuro Fujita wrote:
>> Thank you for your testing. I updated the patch according to your
>> comments. Attached is the updated version of the patch.
>
> I'd like to share result of my review even though it's not fully
> finished. So far I looked from viewpoint of API design, code
> formatting, and documentation. I'll examine effectiveness of the patch
> and details of implementation next week, and hopefully try writing
> ANALYZE handler for pgsql_fdw :)
>
> New patch has correct format, and it could be applied to HEAD of master
> branch cleanly. All regression tests including those of contrib modules
> have passed. It contains changes of codes and regression tests related
> to the issue, and they have enough comments. IMO the document in this
> patch is not enough to show how to write analyze handler to FDW authors,
> but it can be enhanced afterward. With this patch, FDW author can
> provide optional ANALYZE handler which updates statistics of foreign
> tables. Planner would be able to generate better plan by using statistics.
>
>> Yes. But in the updated version, I've refactored analyze.c a little bit
>> to allow FDW authors to simply call do_analyze_rel().
> <snip>
>> The updated version enables FDW authors to just write their own
>> acquire_sample_rows(). On the other hand, by retaining to hook
>> AnalyzeForeignTable routine at analyze_rel(), higher level than
>> acquire_sample_rows() as before, it allows FDW authors to write
>> AnalyzeForeignTable routine for foreign tables on a remote server to ask
>> the server for its current stats instead, as pointed out earlier by Tom
>> Lane.
>
> IIUC, this patch offers three options to FDWs: a) set
> AnalyzeForeignTable to NULL to indicate lack of capability, b) provide
> AnalyzeForeignTable which calls do_analyze_rel with custom
> sample_row_acquirer, and c) create statistics data from scratch by FDW
> itself by doing similar things to do_analyze_rel with given argument or
> copying statistics from remote PostgreSQL server.
>
> ISTM that this design is well-balanced between simplicity and
> flexibility. Maybe these three options would suit web-based wrappers,
> file-based or RDBMS wrappers, and pgsql_fdw respectively. I think that
> adding more details of FdwRoutine, such as purpose of new callback
> function and difference from required ones, would help FDW authors,
> including me :)
>
> I have some random comments:
>
> - I think separated typedef of sample_acquire_rows would make codes more
> readable. In addition, parameters of the function should be described
> explicitly.
> - I couldn't see the reason why file_fdw sets ctid of sample tuples,
> though I guess it's for Vitter's random sampling algorithm. If every
> FDW must set valid ctid to sample tuples, it should be mentioned in
> document of AnalyzeForeignTable. Exporting some functions from
> analyze.c relates this issue?
> - Why file_fdw skips sample tuples which have NULL value? AFAIS
> original acquire_sample_rows doesn't do so.
> - Some comment lines go past 80 columns.
> - Some headers included in file_fdw.c seems unnecessary.
>
> Regards,

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Etsuro Fujita 2011-11-21 04:00:06 Re: WIP: Collecting statistics on CSV file data
Previous Message Jeff Davis 2011-11-21 03:24:16 Re: Singleton range constructors versus functional coercion notation