Re: WIP Patch: Use sortedness of CSV foreign tables for query planning

From: "Etsuro Fujita" <fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp>
To: "'Robert Haas'" <robertmhaas(at)gmail(dot)com>
Cc: "'PostgreSQL-development'" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: WIP Patch: Use sortedness of CSV foreign tables for query planning
Date: 2012-08-06 02:41:48
Message-ID: 005a01cd737d$06d549c0$147fdd40$@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Robert,

> From: Robert Haas [mailto:robertmhaas(at)gmail(dot)com]

> On Thu, Aug 2, 2012 at 7:01 AM, Etsuro Fujita
> <fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
> > The following is a comment at fileGetForeignPaths() in contrib/file_fdw.c:
> >
> > /*
> > * If data file was sorted, and we knew it somehow, we could insert
> > * appropriate pathkeys into the ForeignPath node to tell the planner
> > * that.
> > */
> >
> > To do this, I would like to propose new generic options for a file_fdw
foreign
> > table to specify the sortedness of a data file. While it is best to allow
> to
> > specify the sortedness on multiple columns, the current interface for the
> > generic options dose not seems to be suitable for doing it. As a
compromise,
> I
> > would like to propose single-column sortedness options and insert
appropriate
> > pathkeys into the ForeignPath node based on these information:
>
> I am not sure it is a good idea to complicate file_fdw with frammishes
> of marginal utility. I guess I tend to view things like file_fdw as a
> mechanism for getting the data into the database, not necessarily
> something that you actually want to keep your data in permanently and
> run complex queries against.

I think file_fdw is useful for managing log files such as PG CSV logs. Since
often, such files are sorted by timestamp, I think the patch can improve the
performance of log analysis, though I have to admit my demonstration was not
realistic.

> It seems like that's the direction we're
> headed in here - statistics, indexing, etc. I am all in favor of
> having some kind of pluggable storage engine as an alternative to our
> heap, but I'm not sure a flat-file is a good choice.

As you pointed out, I would like to allow indexing to be done for CSV foreign
tables, but that is another problem. The submitted patch or the above comment
is not something toward indexing, so to say, an optimization of the current
file_fdw module.

Thanks,

Best regards,
Etsuro Fujita

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavel Stehule 2012-08-06 03:01:15 Re: WIP patch for LATERAL subqueries
Previous Message Tom Lane 2012-08-06 02:07:16 Re: WIP patch for LATERAL subqueries