Faster str to int conversion (was Table with large number of int columns, very slow COPY FROM)

From: Andres Freund <andres(at)anarazel(dot)de>
To: Alex Tokarev <dwalin(at)dwalin(dot)ru>, pgsql-hackers(at)postgresql(dot)org
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Faster str to int conversion (was Table with large number of int columns, very slow COPY FROM)
Date: 2017-12-08 21:44:37
Message-ID: 20171208214437.qgn6zdltyq5hmjpk@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-performance

Hi,

On 2017-12-08 10:17:34 -0800, Andres Freund wrote:
> the strtoll is libc functionality triggered by pg_atoi(), something I've
> seen show up in numerous profiles. I think it's probably time to have
> our own optimized version of it rather than relying on libcs.

Attached is a hand-rolled version. After quickly hacking up one from
scratch, I noticed we already kind of have one for int64 (scanint8), so
I changed the structure of this one to be relatively similar.

It's currently using the overflow logic from [1], but that's not
fundamentally required, we could rely on fwrapv for this one too.

This one improves performance of the submitted workload from 1223.950ms
to 1020.640ms (best of three). The profile's shape changes quite
noticeably:

master:
+ 15.38% postgres libc-2.25.so [.] __GI_____strtoll_l_internal
+ 11.79% postgres postgres [.] heap_fill_tuple
+ 8.00% postgres postgres [.] CopyFrom
+ 7.40% postgres postgres [.] CopyReadLine
+ 6.79% postgres postgres [.] ExecConstraints
+ 6.68% postgres postgres [.] NextCopyFromRawFields
+ 6.36% postgres postgres [.] heap_compute_data_size
+ 6.02% postgres postgres [.] pg_atoi
patch:
+ 13.70% postgres postgres [.] heap_fill_tuple
+ 10.46% postgres postgres [.] CopyFrom
+ 9.31% postgres postgres [.] pg_strto32
+ 8.39% postgres postgres [.] CopyReadLine
+ 7.88% postgres postgres [.] ExecConstraints
+ 7.63% postgres postgres [.] InputFunctionCall
+ 7.41% postgres postgres [.] heap_compute_data_size
+ 7.21% postgres postgres [.] pg_verify_mbstr
+ 5.49% postgres postgres [.] NextCopyFromRawFields

This probably isn't going to resolve Alex's performance concerns
meaningfully, but seems quite worthwhile to do anyway.

We probably should have int8/16/64 version coded just as use the 32bit
version, but I decided to leave that out for now. Primarily interested
in comments. Wonder a bit whether it's worth providing an 'errorOk'
mode like scanint8 does, but surveying its callers suggests we should
rather change them to not need it...

Greetings,

Andres Freund

[1] http://archives.postgresql.org/message-id/20171030112751.mukkriz2rur2qkxc%40alap3.anarazel.de

Attachment Content-Type Size
0001-Provide-overflow-safe-integer-math-inline-functions.patch text/x-diff 11.2 KB
0001-Hand-code-string-to-int32-conversion-for-performance.patch text/x-diff 3.6 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alexander Korotkov 2017-12-08 21:49:57 Re: [HACKERS] Re: proposal - psql: possibility to specify sort for describe commands, when size is printed
Previous Message Alexander Korotkov 2017-12-08 21:38:39 Re: [HACKERS] Re: Is anything preventing us from allowing write to foreign tables from standby?

Browse pgsql-performance by date

  From Date Subject
Next Message Sam Gendler 2017-12-09 04:47:13 Re: Learning EXPLAIN
Previous Message Andres Freund 2017-12-08 18:17:34 Re: Table with large number of int columns, very slow COPY FROM