Load TIME fields - proposed performance improvement

From: Peter Smith <smithpb2250(at)gmail(dot)com>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Load TIME fields - proposed performance improvement
Date: 2020-09-22 01:42:55
Message-ID: CAHut+Pu89TWjq530V2gY5O6SWi=OEJMQ_VHMt8bdZB_9JFna5A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Hackers.

I have a test table with multiple (10) columns defined as TIME WITHOUT
TIME ZONE.

When loading this table with a lot of data (e.g. "COPY tbl FROM
/my/path/2GB.csv WITH (FORMAT CSV)") I observed it was spending an
excessive amount of time within the function GetCurrentDateTime.

IIUC the code is calling GetCurrentDateTime only to acquire the
current TX timestamp as a struct pg_tm in order to derive some
timezone information.

My test table has 10 x TIME columns.
My test data has 22.5 million rows (~ 2GB)
So that's 225 million times the GetCurrentDateTime function is called
to populate the struct with the same values.

I have attached a patch which caches this struct, so now those 225
million calls are reduced to just 1 call.

~

Test Results:

Copy 22.5 million rows data (~ 2GB)

BEFORE
Run 1 = 4m 36s
Run 2 = 4m 30s
Run 3 = 4m 32s
perf showed 20.95% time in GetCurrentDateTime

AFTER (cached struct)
Run 1 = 3m 44s
Run 2 = 3m 44s
Run 3 = 3m 45s
perf shows no time in GetCurrentDateTime
~17% performance improvement in my environment. YMMV.

~

Thoughts?

Kind Regards
Peter Smith.
Fujitsu Australia.

Attachment Content-Type Size
DecodeTimeOnly_GetCurrentDateTime.patch application/octet-stream 1.0 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2020-09-22 02:12:45 Re: Load TIME fields - proposed performance improvement
Previous Message tsunakawa.takay@fujitsu.com 2020-09-22 01:17:38 RE: Transactions involving multiple postgres foreign servers, take 2