Re: WIP: Collecting statistics on CSV file data

From: Shigeru Hanada <shigeru(dot)hanada(at)gmail(dot)com>
To: Etsuro Fujita <fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: WIP: Collecting statistics on CSV file data
Date: 2011-11-07 11:26:50
Message-ID: 4EB7C07A.5030205@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

(2011/10/20 18:56), Etsuro Fujita wrote:
> I revised the patch according to Hanada-san's comments. Attached is the
> updated version of the patch.
>
> Changes:
>
> * pull up of logging "analyzing foo.bar"
> * new vac_update_relstats always called
> * tab-completion in psql
> * add "foreign tables are not analyzed automatically..." to 23.1.3
> Updating Planner Statistics
> * some other modifications

Submission review
=================

- Patch can be applied, and all regression tests passed. :)

Random comments
===============

- Some headers are not necessary for file_fdw.c

#include "access/htup.h"
#include "commands/dbcommands.h"
#include "optimizer/plancat.h"
#include "utils/elog.h"
#include "utils/guc.h"
#include "utils/lsyscache.h"
#include "utils/rel.h"

- It might be better to mention in document that users need to
explicitly specify foreign table name to ANALYZE command.

- I think backend should be aware the case which a handler is NULL. For
the case of ANALYZE of foreign table, it would be worth telling user
that request was not accomplished.

- file_fdw_do_analyze_rel is almost copy of do_analyze_rel. IIUC,
difference against do_analyze_rel are:
* don't logging analyze target
* don't switch userid to the owner of target table
* don't measure elapsed time for autoanalyze deamon
* don't handle index
* some comments are removed.
* sample rows are acquired by file_fdw's routine

I don't see any problem here, but would you confirm that all of them are
intentional?

Besides, keeping format (mainly indent and folding) of this function
similar to do_analyze_rel would help to follow changes in do_analyze_rel.

- IMHO exporting CopyState should be avoided. One possible idea is
adding new COPY API which allows to extract records from the file with
skipping specified number or rate.

- In your design, each FDW have to copy most of do_analyze_rel to their
own source. It means that FDW authors must know much details of ANALYZE
to implement ANALYZE handler. Actually, your patch exports some static
functions from analyze.c. Have you considered hooking
acquire_sample_rows()? Such handler should be more simple, and
FDW-specific. As you say, such design requires FDWs to skip some
records, but it would be difficult for some FDW (e.g. twitter_fdw) which
can't pick sample data up easily. IMHO such problem *must* be solved by
FDW itself.

--
Shigeru Hanada

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Heikki Linnakangas 2011-11-07 11:45:13 Re: Storing hot members of PGPROC out of the band
Previous Message Gregg Jaskiewicz 2011-11-07 11:14:41 git trunk doesn't build