WIP Patch: Use sortedness of CSV foreign tables for query planning

From: "Etsuro Fujita" <fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp>
To: "'PostgreSQL-development'" <pgsql-hackers(at)postgresql(dot)org>
Subject: WIP Patch: Use sortedness of CSV foreign tables for query planning
Date: 2012-08-02 11:01:27
Message-ID: 002501cd709e$29e7e760$7db7b620$@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

The following is a comment at fileGetForeignPaths() in contrib/file_fdw.c:

/*
* If data file was sorted, and we knew it somehow, we could insert
* appropriate pathkeys into the ForeignPath node to tell the planner
* that.
*/

To do this, I would like to propose new generic options for a file_fdw foreign
table to specify the sortedness of a data file. While it is best to allow to
specify the sortedness on multiple columns, the current interface for the
generic options dose not seems to be suitable for doing it. As a compromise, I
would like to propose single-column sortedness options and insert appropriate
pathkeys into the ForeignPath node based on these information:

sorted: Boolean option to specify whether data file is sorted by a column.
key: Specifies the name of a column by which data file is sorted. Required
when the above option is set to true.
direction: Specifies the sort order: asc or desc. The default is asc.
nulls: Specifies that nulls sort before or after non-nulls: first or last.
first is the default when direction option is set desc. When direction option
is not set desc, last is the default.

Attached is a WIP patch implementing this feature. I would like to demonstrate
the usefulness of the patch. Experimental results are shown below. Here, data
in /home/pgsql/relation.csv is sorted by aid in ascending order.

postgres=# CREATE EXTENSION file_fdw;
CREATE EXTENSION
postgres=# CREATE SERVER fs FOREIGN DATA WRAPPER file_fdw;
CREATE SERVER
postgres=# CREATE FOREIGN TABLE ft (aid INTEGER, bid INTEGER) SERVER fs OPTIONS
(filename '/home/pgsql/relation.csv', format 'csv', delimiter ',');
CREATE FOREIGN TABLE
postgres=# ANALYZE ft;
ANALYZE
postgres=# SELECT count(*) FROM ft;
count
----------
50000000
(1 row)

postgres=# SELECT count(DISTINCT aid) FROM ft;
count
---------
1000000
(1 row)

postgres=# EXPLAIN ANALYZE SELECT DISTINCT aid FROM ft ORDER BY aid;
QUERY PLAN

--------------------------------------------------------------------------------
----------------------------------------------------
Unique (cost=14187375.19..14437375.19 rows=1092929 width=4) (actual time=48952
.602..62788.934 rows=1000000 loops=1)
-> Sort (cost=14187375.19..14312375.19 rows=50000000 width=4) (actual time=
48952.601..56635.448 rows=50000000 loops=1)
Sort Key: aid
Sort Method: external sort Disk: 684272kB
-> Foreign Scan on ft (cost=0.00..5059137.00 rows=50000000 width=4) (
actual time=0.073..18324.062 rows=50000000 loops=1)
Foreign File: /home/pgsql/relation.csv
Foreign File Size: 484444500
Total runtime: 63019.868 ms
(8 rows)

postgres=# ALTER FOREIGN TABLE ft OPTIONS ( ADD sorted 'true', key 'aid' );
ALTER FOREIGN TABLE
postgres=# EXPLAIN ANALYZE SELECT DISTINCT aid FROM ft ORDER BY aid;
QUERY PLAN

--------------------------------------------------------------------------------
----------------------------------------------
Unique (cost=0.00..5184137.00 rows=1092929 width=4) (actual time=0.074..23124.
195 rows=1000000 loops=1)
-> Foreign Scan on ft (cost=0.00..5059137.00 rows=50000000 width=4) (actual
time=0.070..17633.821 rows=50000000 loops=1)
Foreign File: /home/pgsql/relation.csv
Foreign File Size: 484444500
Total runtime: 23213.909 ms
(5 rows)

Any comments and suggestions are welcomed.

Thanks,

Best regards,
Etsuro Fujita

Attachment Content-Type Size
file-fdw-pathkeys-0802.patch application/octet-stream 17.5 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2012-08-02 11:15:15 Re: Help me develop new commit_delay advice
Previous Message Qi Huang 2012-08-02 09:03:04 Git diff patch in context diff format