From: | "Etsuro Fujita" <fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp> |
---|---|
To: | <pgsql-hackers(at)postgresql(dot)org> |
Subject: | WIP Patch: Selective binary conversion of CSV file foreign tables |
Date: | 2012-05-08 11:26:02 |
Message-ID: | 001801cd2d0d$59dec990$0d9c5cb0$@lab.ntt.co.jp |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
I would like to propose to improve parsing efficiency of contrib/file_fdw by
selective parsing proposed by Alagiannis et al.[1], which means that for a
CSV/TEXT file foreign table, file_fdw performs binary conversion only for
the columns needed for query processing. Attached is a WIP patch
implementing the feature.
I evaluated the efficiency of the patch using SELECT count(*) on a CSV file
foreign table of 5,000,000 records, which had the same definition as the
pgbench history table. The following run is done on a single core of a
3.00GHz Intel Xeon CPU with 8GB of RAM. Configuration settings are all
default.
w/o the patch: 7255.898 ms
w/ the patch: 3363.297 ms
On reflection of [2], I think it would be better to disable this feature
when the validation option is set to 'true'; file_fdw converts all columns
to binary representation. So, it verifies that each tuple meets all column
data types as well as all kinds of constraints.
I appreciate your comments.
Best regards,
Etsuro Fujita
[1] http://homepages.cwi.nl/~idreos/NoDBsigmod2012.pdf
[2] https://commitfest.postgresql.org/action/patch_view?id=822
Attachment | Content-Type | Size |
---|---|---|
file_fdw_sel_bin_conv_v1.patch | application/octet-stream | 8.5 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Michael Nolan | 2012-05-08 14:09:41 | Re: problem/bug in drop tablespace? |
Previous Message | Noah Misch | 2012-05-08 09:01:05 | Re: Temporary tables under hot standby |