| From: | Peter Smith <smithpb2250(at)gmail(dot)com> |
|---|---|
| To: | PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
| Subject: | Load TIME fields - proposed performance improvement |
| Date: | 2020-09-22 01:42:55 |
| Message-ID: | CAHut+Pu89TWjq530V2gY5O6SWi=OEJMQ_VHMt8bdZB_9JFna5A@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
Hi Hackers.
I have a test table with multiple (10) columns defined as TIME WITHOUT
TIME ZONE.
When loading this table with a lot of data (e.g. "COPY tbl FROM
/my/path/2GB.csv WITH (FORMAT CSV)") I observed it was spending an
excessive amount of time within the function GetCurrentDateTime.
IIUC the code is calling GetCurrentDateTime only to acquire the
current TX timestamp as a struct pg_tm in order to derive some
timezone information.
My test table has 10 x TIME columns.
My test data has 22.5 million rows (~ 2GB)
So that's 225 million times the GetCurrentDateTime function is called
to populate the struct with the same values.
I have attached a patch which caches this struct, so now those 225
million calls are reduced to just 1 call.
~
Test Results:
Copy 22.5 million rows data (~ 2GB)
BEFORE
Run 1 = 4m 36s
Run 2 = 4m 30s
Run 3 = 4m 32s
perf showed 20.95% time in GetCurrentDateTime
AFTER (cached struct)
Run 1 = 3m 44s
Run 2 = 3m 44s
Run 3 = 3m 45s
perf shows no time in GetCurrentDateTime
~17% performance improvement in my environment. YMMV.
~
Thoughts?
Kind Regards
Peter Smith.
Fujitsu Australia.
| Attachment | Content-Type | Size |
|---|---|---|
| DecodeTimeOnly_GetCurrentDateTime.patch | application/octet-stream | 1.0 KB |
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Tom Lane | 2020-09-22 02:12:45 | Re: Load TIME fields - proposed performance improvement |
| Previous Message | tsunakawa.takay@fujitsu.com | 2020-09-22 01:17:38 | RE: Transactions involving multiple postgres foreign servers, take 2 |