Re: Review/Pull Request: Adding new CRC32C implementation for IBM S390X

From: John Naylor <johncnaylorls(at)gmail(dot)com>
To: Eduard Stefes <Eduard(dot)Stefes(at)ibm(dot)com>
Cc: "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>, "brueckner(at)linux(dot)ibm(dot)com" <brueckner(at)linux(dot)ibm(dot)com>, "iii(at)linux(dot)ibm(dot)com" <iii(at)linux(dot)ibm(dot)com>
Subject: Re: Review/Pull Request: Adding new CRC32C implementation for IBM S390X
Date: 2025-06-02 04:48:11
Message-ID: CANWCAZYKH3jXb1YfYpwfuMe2AJAZJHHYcJAAvx96HeOSzOrHOQ@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, May 31, 2025 at 2:41 AM Eduard Stefes <Eduard(dot)Stefes(at)ibm(dot)com> wrote:
>
> On Thu, 2025-05-08 at 05:23 +0700, John Naylor wrote:
>
> > This case is a bit different, since Arm can compute hardware CRC on
> > any input size. The fast path here is only guaranteed to be taken at
> > inputs of 79 bytes or bigger, with the 64-byte main loop and 16-byte
> > alignment. For larger inputs, an indirect call shouldn't matter, so
> > to
> > keep it simple I'd recommend having only the runtime check. And for
> > smaller inputs call sb8 directly -- this will avoid the indirect call
> > which in turn just jumps to sb8. This is important because the server
> > computes CRC on a 20-byte input for the WAL record header while
> > holding a lock.
>
> I worked on the code and got it working on 16 byte chunks so the WAL
> writes will directly benefit from this. I will try to calculate and add
> the polynomials for the smaller chunks. Maybe we where wrong and we
> will still see a speed improvement, also for the smaller pieces.
> However that is something that I cannot tackle immediately. I'll come
> back to this in the summer.

The measurements from your previous message used 16-byte chunks -- did
that have the necessary constants to get the correct answer?

I'm not sure what you're in doubt about, but I guess it will be clear
with your next patch.

> > Something like:
> >
> > #define COMP_CRC32C(crc, data, len) \
> > ((crc) = pg_comp_crc32c_dispatch((crc), (data), (len)))
> >
> > static inline
> > pg_crc32c
> > pg_comp_crc32c_dispatch(pg_crc32c crc, const void *data, size_t len)
> > {
> > /*
> > if (len < VX_MIN_LEN + VX_ALIGN_MASK)
> > {
> > /*
> > * For inputs small enough that we may not take the vector path,
> > * just call sb8 directly.
> > */
> > return pg_comp_crc32c_sb8(crc, data, len);;
> > }
> > else
> > /* Otherwise, use a runtime check for S390_VX instructions. */
> > return pg_comp_crc32c(crc, data, len);
> > }
>
> Is static inline generally preferred here or should I do something
> like:
>
> #define COMP_CRC32C(crc, data, len) \
> ((crc) = (len) < MAX_S390X_CRC ? \
> pg_comp_crc32c_sb8((crc),(data),(len)) : \
> pg_comp_crc32c((crc), (data), (len)))

That macro is actually better -- the static inline was used for x86
because it has looping which is not the case for S390X.

> > Does the build still fall back to sb8 with a warning, or does it
> > fail?
> > It'd simpler if the guard against certain clang versions went into
> > the
> > choose function, so it can fall back to sb8.
>
> My current implementation will silently fall back to sb8 and not
> compile any of the s390x code.

That seems like a good design.

> Other possible strategies are:
>
> - print a warning and fall back to sb0
> - fail the compile and inform the user to switch compiler or to disable
> crc32vx in the build system
> - move all the checks into the .c file and to this with with macros
>
> indeed for zlib-ng we went the way to put all the checks into the code
> and fail the build if compiled with a known broken compiler.
>
> My arguments to not do the same in postgres are:
>
> - postgres uses sb8 on s390x already so by NOT changing it to another
> algorithm we do not change the expected behavior.
> - imho, this is something that the build system should check and take
> care of, not the code that's getting compiled.

I'm inclined to agree, but I wonder if these guards should go inside
the link check. I.e. the separate config machinery for "have bad
compiler" seems like unnecessary clutter.

--
John Naylor
Amazon Web Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Fujii Masao 2025-06-02 04:48:25 Suggestions for improving \conninfo output in v18
Previous Message Dilip Kumar 2025-06-02 04:14:20 Re: Speedup truncations of temporary relation forks