Re: Proposal: Create index on foreign table

From: David Fetter <david(at)fetter(dot)org>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: Etsuro Fujita <fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Proposal: Create index on foreign table
Date: 2012-03-16 17:07:43
Message-ID: 20120316170743.GE456@fetter.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Mar 16, 2012 at 11:58:29AM +0200, Heikki Linnakangas wrote:
> On 16.03.2012 10:44, Etsuro Fujita wrote:
> > I have a plan to support 'Create index on foreign table' for 9.3.
> > Here is my plan.
> >
> > The index creation is supported for a flat file such as CSV and a

As others, I don't see a reason to restrict this to some particular
type of FDW.

> > remote table on a RDB e.g., Postgres using CREATE INDEX. (I
> > thought using a new statement, CREATE FOREIGN INDEX, at first, but
> > I think that CREATE INDEX would be sufficient to define an index
> > for the foreign table.) For a flat file, CREATE INDEX constructs
> > an index in the same way as an index for a regular table.
>
> I think this belongs completely in the remote data source.

I think this needs to be decided on a case-by-case basis instead.

> If you want to index flat files, create an extra file for it or
> something, and enhance the wrapper so that it can take advantage of
> it. Keeping the index inside the database while the data is
> somewhere else creates a whole new class of problems.

How? These aren't super different from those for, say, unlogged
tables.

> For starters, how would you keep the index up-to-date when the flat
> file is modified?

One way is to poll the remote source for evidence of such changes
during auto_vacuum or with similar daemon processes. Another is by
waiting for a signal from an external source such as a NOTIFY. Which
is more appropriate will again depend on circumstances.

> If you want to take advantage of PostgreSQL's indexing, you'll just
> have to just load the data into a regular table.

I disagree. Indexing in general allows you to store only log-N index
rows for each N rows in an external table, which could be a very big
win. Deciding in advance for everyone that this is not worthwhile is
not in our purview.

> > On the other hand, for a remote table, CREATE INDEX collects
> > information about the index on the specified column(s) for the
> > specified table that was created on the remote table.
>
> I don't see the point of this either. The planner asks the FDW for
> cost estimates, and if the FDW knows about indexes in the remote
> server, it can certainly adjust the estimates accordingly. But
> that's all internal to the FDW. It might want delegate the whole
> cost estimation to the remote server by running EXPLAIN there, or it
> could use its knowledge of indexes that exist there, but I don't see
> why the rest of the system would need to know what indexes there are
> in the remote system.

Good point, for the remote index case, which I contend is not every
one :)

> If the FDW needs that information, it can query the remote server
> for it on first access, and cache the information for the lifetime
> of the session.

Of course, a mere few GB of information queried each time couldn't
possibly cause intolerable overheads...

Cheers,
David.
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david(dot)fetter(at)gmail(dot)com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jeff Davis 2012-03-16 17:14:49 Re: initdb and fsync
Previous Message Peter Eisentraut 2012-03-16 17:07:28 Re: patch: autocomplete for functions