Re: Non-decimal integer literals

From: David Rowley <dgrowleyml(at)gmail(dot)com>
To: Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>
Cc: John Naylor <john(dot)naylor(at)enterprisedb(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Non-decimal integer literals
Date: 2022-11-28 23:43:33
Message-ID: CAApHDvpBTHL58j6UmnUyrz=5h15NhY30_AvZ_5fQ7LJMhMcyxA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, 26 Nov 2022 at 05:13, Peter Eisentraut
<peter(dot)eisentraut(at)enterprisedb(dot)com> wrote:
>
> On 24.11.22 10:13, David Rowley wrote:
> > I
> > remember many years ago and several jobs ago when working with SQL
> > Server being able to speed up importing data using hexadecimal
> > DATETIMEs. I can't think why else you might want to represent a
> > DATETIME as a hexstring, so I assumed this was a large part of the use
> > case for INTs in PostgreSQL. Are you telling me that better
> > performance is not something anyone will want out of this feature?
>
> This isn't about datetimes but about integers.

I'm aware. My aim was to show that hex is commonly used as a more
efficient way of getting integer numbers in and out of computers.

Likely it's better for me to quantify this performance increase claim
with some actual performance results.

Here's master (@f0cd57f85) doing copy ab2 from '/tmp/ab.csv';

ab2 is a table with no indexes and just 2 int columns.

16.55% postgres [.] CopyReadLine
7.82% postgres [.] pg_strtoint32
7.60% postgres [.] CopyReadAttributesText
7.06% postgres [.] NextCopyFrom
4.40% postgres [.] CopyFrom

The copy completes in 2512.5278 ms (average time over 10 runs)

Patching master with your v11 patch and copying in hex numbers instead
of decimal numbers shows:

14.39% postgres [.] CopyReadLine
8.60% postgres [.] pg_strtoint32
6.95% postgres [.] NextCopyFrom
6.79% postgres [.] CopyReadAttributesText
4.81% postgres [.] CopyFrom

This shows that we're spending proportionally less time in
CopyReadLine() and proportionally more time in pg_strtoint32(). There
are probably two things going on there, CopyReadLine is likely faster
due to having to read fewer bytes and pg_strtoint32() is likely slower
due to additional branching and code size.

This (copy ab2 from '/tmp/abhex.csv') saw an average time of 2720.1387
ms over 10 runs.

Patching master with your v11 patch +
more_efficient_hex_oct_and_binary_processing.diff

15.68% postgres [.] CopyReadLine
7.75% postgres [.] NextCopyFrom
7.73% postgres [.] pg_strtoint32
6.25% postgres [.] CopyReadAttributesText
4.76% postgres [.] CopyFrom

The average time to import the hex version of the csv file comes down
to 2385.7298 ms over 10 runs.

I didn't run any tests to see how much the performance of importing
the decimal representation slowed down from the v11 patch. I assume
there will be a small performance hit due to the extra processing done
in pg_strtoint32()

David

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2022-11-28 23:48:44 Re: Allow file inclusion in pg_hba and pg_ident files
Previous Message Robert Haas 2022-11-28 23:42:48 Re: fixing CREATEROLE