Re: Avoid unecessary MemSet call (src/backend/utils/cache/relcache.c)

From: Ranier Vilela <ranier(dot)vf(at)gmail(dot)com>
To: David Rowley <dgrowleyml(at)gmail(dot)com>
Cc: Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Avoid unecessary MemSet call (src/backend/utils/cache/relcache.c)
Date: 2022-05-19 16:09:46
Message-ID: CAEudQAr=5ud11j7foKqm8mT8iFFnZEu1OO614D6kMoYyTgaKAQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Taking it a step further.
Created a new patch into commitfest, targeting 16 version.
https://commitfest.postgresql.org/38/3645/

Currently native memset is well optimized on several platforms, including
Windows 64 bits [1].

However, even the native memset has problems,
I redid the David's memset.c test:

C:\usr\src\tests\memset>memset2 2000000000
Running 2000000000 loops
MemSet: size 8: 6.635000 seconds
MemSet: size 16: 6.594000 seconds
MemSet: size 32: 6.694000 seconds
MemSet: size 64: 9.002000 seconds
MemSet: size 128: 10.598000 seconds
MemSet: size 256: 25.061000 seconds
MemSet: size 512: 27.365000 seconds
memset: size 8: 0.594000 seconds
memset: size 16: 0.595000 seconds
memset: size 32: 1.189000 seconds
memset: size 64: 2.378000 seconds
memset: size 128: 4.753000 seconds
memset: size 256: 24.391000 seconds
memset: size 512: 27.064000 seconds

Both MemSet/memset perform very poorly with 256/512.

But, I believe it is worth removing the use of MemSet, because the usage is
empirical and has been mixed with memset in several places in the code,
without any criteria.
Using just memset makes the mental process of using it more simplified and
it seems like there aren't any regressions when removing the use of MemSet.

Windows 10 64 bit
msvc 2019 64 bit
RAM 8GB
SSD 256GB
Postgres (15beta1 with original configuration)

1. pgbench -c 50 -T 300 -S -n -U postgres
HEAD:
pgbench (15beta1)
transaction type: <builtin: select only>
scaling factor: 1
query mode: simple
number of clients: 50
number of threads: 1
maximum number of tries: 1
duration: 300 s
number of transactions actually processed: 10448967
number of failed transactions: 0 (0.000%)
latency average = 1.432 ms
initial connection time = 846.186 ms
tps = 34926.861987 (without initial connection time)

PATCHED (without MemSet)
pgbench (15beta1)
transaction type: <builtin: select only>
scaling factor: 1
query mode: simple
number of clients: 50
number of threads: 1
maximum number of tries: 1
duration: 300 s
number of transactions actually processed: 10655332
number of failed transactions: 0 (0.000%)
latency average = 1.404 ms
initial connection time = 866.203 ms
tps = 35621.045750 (without initial connection time)

2.
CREATE TABLE t_test (x numeric);
INSERT INTO t_test SELECT random()
FROM generate_series(1, 5000000);
ANALYZE;
SHOW work_mem;

HEAD:
postgres=# explain analyze SELECT * FROM t_test ORDER BY x;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------
Gather Merge (cost=397084.73..883229.71 rows=4166666 width=11) (actual
time=1328.331..2743.310 rows=5000000 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Sort (cost=396084.71..401293.04 rows=2083333 width=11) (actual
time=1278.442..1513.510 rows=1666667 loops=3)
Sort Key: x
Sort Method: external merge Disk: 25704kB
Worker 0: Sort Method: external merge Disk: 23960kB
Worker 1: Sort Method: external merge Disk: 23960kB
-> Parallel Seq Scan on t_test (cost=0.00..47861.33 rows=2083333
width=11) (actual time=0.234..128.607 rows=1666667 loops=3)
Planning Time: 0.064 ms
Execution Time: 2863.381 ms
(11 rows)

PATCHED:
postgres=# explain analyze SELECT * FROM t_test ORDER BY x;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------
Gather Merge (cost=397084.73..883229.71 rows=4166666 width=11) (actual
time=1309.703..2705.027 rows=5000000 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Sort (cost=396084.71..401293.04 rows=2083333 width=11) (actual
time=1281.111..1515.928 rows=1666667 loops=3)
Sort Key: x
Sort Method: external merge Disk: 24880kB
Worker 0: Sort Method: external merge Disk: 24776kB
Worker 1: Sort Method: external merge Disk: 23960kB
-> Parallel Seq Scan on t_test (cost=0.00..47861.33 rows=2083333
width=11) (actual time=0.260..130.277 rows=1666667 loops=3)
Planning Time: 0.060 ms
Execution Time: 2825.201 ms
(11 rows)

I leave MemSetAligned and MemSetLoop to another step.

regards,
Ranier Vilela

[1]
https://msrc-blog.microsoft.com/2021/01/11/building-faster-amd64-memset-routines/

Attachment Content-Type Size
001_avoid_unecessary_memset_call.patch application/octet-stream 486 bytes
002_refactoring_memset_api_usage.patch application/octet-stream 134.9 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Christoph Berg 2022-05-19 17:20:05 Re: 15beta1 crash on mips64el in pg_regress/triggers
Previous Message Tom Lane 2022-05-19 15:49:45 Re: 15beta1 crash on mips64el in pg_regress/triggers