Proposal: Create index on foreign table

From: Etsuro Fujita <fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp>
To: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Proposal: Create index on foreign table
Date: 2012-03-16 08:44:25
Message-ID: 4F62FD69.2060007@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I have a plan to support 'Create index on foreign table' for 9.3. Here
is my plan.

The index creation is supported for a flat file such as CSV and a remote
table on a RDB e.g., Postgres using CREATE INDEX. (I thought using a
new statement, CREATE FOREIGN INDEX, at first, but I think that CREATE
INDEX would be sufficient to define an index for the foreign table.)
For a flat file, CREATE INDEX constructs an index in the same way as an
index for a regular table. On the other hand, for a remote table,
CREATE INDEX collects information about the index on the specified
column(s) for the specified table that was created on the remote table.
An index created is stored in pg_class and pg_index like an index for a
regular table. It depends on the wrappers implementation whether it
supports the options such as UNIQUE or WHERE predicates, though I think
that CONCURRENTLY is not supported in common for the foreign tables.
For a flat file, I plan that the user can specify all the options
excluding CONCURRENTLY and UNIQUE. On the other hand, for a remote
table, I think that the user can specify only the names of the foreign
table and its column(s), using which the wrapper collects information
about all the related indexes created on the remote table. To do so,
I'd like to propose new FDW callback routines:

CreateIndex(): This is called maybe from DefineIndex(), and does the
similar task to index_create(). For a flat file, this function makes
the catalog entries for the index and actually build the index, while
for a remote table, it just stores the catalog entries collected from
the remote end.
DropIndex(): This is called at DROP INDEX, and does the similar task to
index_drop().

I'd like to build the index physical data file for a flat file using the
index access method of regular tables (ie btree, hash, gin, gist, and
spgist) based on the following transformation between the TID and the
file offset to some data in the file:

block_number = file_offset_to_some_data / BLCKSZ
offset_number = file_offset_to_some_data % BLCKSZ

I plan to make use of the above index for better query optimization.
For a flat file, I'd like to realize index scans, index-only scans,
bitmap (like) scans and parametrized scans on the file in the same way
as those on a regular table utilizing the currently revised FDW
infrastructure. For a remote table, I have to admit that I don't have
any clear idea to make use of the index information stored in the system
catalogs for better query optimization, but I believe that it's useful
for the ORDER BY push down and/or nestloop-with-inner-parametrized-scan
join optimization.

Thoughts?

Best regards,
Etsuro Fujita

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2012-03-16 08:45:15 Re: Command Triggers, v16
Previous Message Andres Freund 2012-03-16 08:43:53 Re: Command Triggers, v16