Re: Faster str to int conversion (was Table with large number of int columns, very slow COPY FROM)

From: Andres Freund <andres(at)anarazel(dot)de>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Alex Tokarev <dwalin(at)dwalin(dot)ru>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Faster str to int conversion (was Table with large number of int columns, very slow COPY FROM)
Date: 2018-07-19 20:32:12
Message-ID: 20180719203212.qso3vgljwns75oho@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-performance

Hi,

On 2018-07-18 14:34:34 -0400, Robert Haas wrote:
> On Sat, Jul 7, 2018 at 4:01 PM, Andres Freund <andres(at)anarazel(dot)de> wrote:
> > FWIW, here's a rebased version of this patch. Could probably be polished
> > further. One might argue that we should do a bit more wide ranging
> > changes, to convert scanint8 and pg_atoi to be also unified. But it
> > might also just be worthwhile to apply without those, given the
> > performance benefit.
>
> Wouldn't hurt to do that one too, but might be OK to just do this
> much. Questions:
>
> 1. Why the error message changes? If there's a good reason, it should
> be done as a separate commit, or at least well-documented in the
> commit message.

Because there's a lot of "invalid input syntax for type %s: \"%s\"",
error messages, and we shouldn't force translators to have separate
version that inlines the first %s. But you're right, it'd be worthwhile
to point that out in the commit message.

> 2. Does the likely/unlikely stuff make a noticeable difference?

Yes. It's also largely a copy from existing code (scanint8), so I don't
really want to differ here.

> 3. If this is a drop-in replacement for pg_atoi, why not just recode
> pg_atoi this way -- or have it call this -- and leave the callers
> unchanged?

Because pg_atoi supports a variable 'terminator'. Supporting that would
create a bit slower code, without being particularly useful. I think
there's only a single in-core caller left after the patch
(int2vectorin). There's a fair argument that that should just be
open-coded to handle the weird space parsing, but given there's probably
external pg_atoi() callers, I'm not sure it's worth doing so?

I don't think it's a good idea to continue to have pg_atoi as a wrapper
- it takes a size argument, which makes efficient code hard.

> 4. Are we sure this is faster on all platforms, or could it work out
> the other way on, say, BSD?

I'd be *VERY* surprised if any would be faster. It's not easy to write a
faster implmentation, than what I've proposed, and especially not so if
you use strtol() as the API (variable bases, a bit of locale support).

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2018-07-19 20:35:02 Re: [HACKERS] possible self-deadlock window after bad ProcessStartupPacket
Previous Message Alexander Korotkov 2018-07-19 20:30:19 Re: Bug in gin insert redo code path during re-compression of empty gin data leaf pages

Browse pgsql-performance by date

  From Date Subject
Next Message Mark Kirkwood 2018-07-19 23:30:29 Re: Why HDD performance is better than SSD in this case
Previous Message Robert Haas 2018-07-18 18:34:34 Re: Faster str to int conversion (was Table with large number of int columns, very slow COPY FROM)