From: | David Fetter <david(at)fetter(dot)org> |
---|---|
To: | Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> |
Cc: | Etsuro Fujita <fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Proposal: Create index on foreign table |
Date: | 2012-03-16 17:07:43 |
Message-ID: | 20120316170743.GE456@fetter.org |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Fri, Mar 16, 2012 at 11:58:29AM +0200, Heikki Linnakangas wrote:
> On 16.03.2012 10:44, Etsuro Fujita wrote:
> > I have a plan to support 'Create index on foreign table' for 9.3.
> > Here is my plan.
> >
> > The index creation is supported for a flat file such as CSV and a
As others, I don't see a reason to restrict this to some particular
type of FDW.
> > remote table on a RDB e.g., Postgres using CREATE INDEX. (I
> > thought using a new statement, CREATE FOREIGN INDEX, at first, but
> > I think that CREATE INDEX would be sufficient to define an index
> > for the foreign table.) For a flat file, CREATE INDEX constructs
> > an index in the same way as an index for a regular table.
>
> I think this belongs completely in the remote data source.
I think this needs to be decided on a case-by-case basis instead.
> If you want to index flat files, create an extra file for it or
> something, and enhance the wrapper so that it can take advantage of
> it. Keeping the index inside the database while the data is
> somewhere else creates a whole new class of problems.
How? These aren't super different from those for, say, unlogged
tables.
> For starters, how would you keep the index up-to-date when the flat
> file is modified?
One way is to poll the remote source for evidence of such changes
during auto_vacuum or with similar daemon processes. Another is by
waiting for a signal from an external source such as a NOTIFY. Which
is more appropriate will again depend on circumstances.
> If you want to take advantage of PostgreSQL's indexing, you'll just
> have to just load the data into a regular table.
I disagree. Indexing in general allows you to store only log-N index
rows for each N rows in an external table, which could be a very big
win. Deciding in advance for everyone that this is not worthwhile is
not in our purview.
> > On the other hand, for a remote table, CREATE INDEX collects
> > information about the index on the specified column(s) for the
> > specified table that was created on the remote table.
>
> I don't see the point of this either. The planner asks the FDW for
> cost estimates, and if the FDW knows about indexes in the remote
> server, it can certainly adjust the estimates accordingly. But
> that's all internal to the FDW. It might want delegate the whole
> cost estimation to the remote server by running EXPLAIN there, or it
> could use its knowledge of indexes that exist there, but I don't see
> why the rest of the system would need to know what indexes there are
> in the remote system.
Good point, for the remote index case, which I contend is not every
one :)
> If the FDW needs that information, it can query the remote server
> for it on first access, and cache the information for the lifetime
> of the session.
Of course, a mere few GB of information queried each time couldn't
possibly cause intolerable overheads...
Cheers,
David.
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david(dot)fetter(at)gmail(dot)com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics
Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate
From | Date | Subject | |
---|---|---|---|
Next Message | Jeff Davis | 2012-03-16 17:14:49 | Re: initdb and fsync |
Previous Message | Peter Eisentraut | 2012-03-16 17:07:28 | Re: patch: autocomplete for functions |