Re: Query execution in Perl TAP tests needs work

From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Query execution in Perl TAP tests needs work
Date: 2023-08-28 13:23:16
Message-ID: 4f3038c7-2b37-2625-c4c6-ebf7cbcb076d@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


On 2023-08-28 Mo 01:29, Thomas Munro wrote:
> Hi,
>
> Every time we run a SQL query, we fork a new psql process and a new
> cold backend process. It's not free on Unix, and quite a lot worse on
> Windows, at around 70ms per query. Take amcheck/001_verify_heapam for
> example. It runs 272 subtests firing off a stream of queries, and
> completes in ~51s on Windows (!), and ~6-9s on the various Unixen, on
> CI.
>
> Here are some timestamps I captured from CI by instrumenting various
> Perl and C bits:
>
> 0.000s: IPC::Run starts
> 0.023s: postmaster socket sees connection
> 0.025s: postmaster has created child process
> 0.033s: backend starts running main()
> 0.039s: backend has reattached to shared memory
> 0.043s: backend connection authorized message
> 0.046s: backend has executed and logged query
> 0.070s: IPC::Run returns
>
> I expected process creation to be slow on that OS, but it seems like
> something happening at the end is even slower. CI shows Windows
> consuming 4 CPUs at 100% for a full 10 minutes to run a test suite
> that finishes in 2-3 minutes everywhere else with the same number of
> CPUs. Could there be an event handling snafu in IPC::Run or elsewhere
> nearby? It seems like there must be either a busy loop or a busted
> sleep/wakeup... somewhere? But even if there's a weird bug here
> waiting to be discovered and fixed, I guess it'll always be too slow
> at ~10ms per process spawned, with two processes to spawn, and it's
> bad enough on Unix.
>
> As an experiment, I hacked up a not-good-enough-to-share experiment
> where $node->safe_psql() would automatically cache a BackgroundPsql
> object and reuse it, and the times for that test dropped ~51 -> ~9s on
> Windows, and ~7 -> ~2s on the Unixen. But even that seems non-ideal
> (well it's certainly non-ideal the way I hacked it up anyway...). I
> suppose there are quite a few ways we could do better:
>
> 1. Don't fork anything at all: open (and cache) a connection directly
> from Perl.
> 1a. Write xsub or ffi bindings for libpq. Or vendor (parts) of the
> popular Perl xsub library?
> 1b. Write our own mini pure-perl pq client module. Or vendor (parts)
> of some existing one.
> 2. Use long-lived psql sessions.
> 2a. Something building on BackgroundPsql.
> 2b. Maybe give psql or a new libpq-wrapper a new low level stdio/pipe
> protocol that is more fun to talk to from Perl/machines?
>
> In some other languages one can do FFI pretty easily so we could use
> the in-tree libpq without extra dependencies:
>
>>>> import ctypes
>>>> libpq = ctypes.cdll.LoadLibrary("/path/to/libpq.so")
>>>> libpq.PQlibVersion()
> 170000
>
> ... but it seems you can't do either static C bindings or runtime FFI
> from Perl without adding a new library/package dependency. I'm not
> much of a Perl hacker so I don't have any particular feeling. What
> would be best?
>
> This message brought to you by the Lorax.

Thanks for raising this. Windows test times have bothered me for ages.

The standard perl DBI library has a connect_cached method. Of course we
don't want to be dependent on it, especially if we might have changed
libpq in what we're testing, and it would place a substantial new burden
on testers like buildfarm owners.

I like the idea of using a pure perl pq implementation, not least
because it could expand our ability to test things at the protocol
level. Not sure how much work it would be. I'm willing to help if we
want to go that way.

Yes you need an external library to use FFI in perl, but there's one
that's pretty tiny. See <https://metacpan.org/pod/FFI::Library>. There
is also FFI::Platypus, but it involves building a library. OTOH, that's
the one that's available standard on my Fedora and Ubuntu systems. I
haven't tried using either Maybe we could use some logic that would use
the FFI interface if it's available, and fall back on current usage.

cheers

andrew

--
Andrew Dunstan
EDB:https://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2023-08-28 13:25:57 Re: persist logical slots to disk during shutdown checkpoint
Previous Message Daniel Gustafsson 2023-08-28 13:11:15 Is pg_regress --use-existing used by anyone or is it broken?