Re: PostgreSQL and Real Application Testing (RAT)

From: Nikolay Samokhvalov <samokhvalov(at)gmail(dot)com>
To: ROS Didier <didier(dot)ros(at)edf(dot)fr>
Cc: "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: PostgreSQL and Real Application Testing (RAT)
Date: 2019-08-28 00:32:44
Message-ID: CANNMO+K8TL8SMTwQZ0JBBMb1JX9TX=kTp1b+eUTzmktTcLW-gw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Aug 27, 2019 at 3:47 AM ROS Didier <didier(dot)ros(at)edf(dot)fr> wrote:

> Hi
>
>
>
> In my business, one of the things blocking the migration from Oracle to
> PostgreSQL is not having the equivalent of Oracle Real Application Testing .
>
> This product captures a charge in production and replay it in a test
> environment.
>
> this allows to know the impacts of a migration to a newer version, the
> creation of an index..
>
> is there an equivalent in the PostgreSQL community?
>
> if not, do you think it's technically possible to do it ?
>
> who would be interested in this project ?
>

Replaying workload might or might not apply well to your case.

There are several major difficulties if you want to replay workload:

1) How to "record" workload. You need to write all your queries to the
Postgres log. Three problems here:
1a) pgreplay expects log_statements = 'all' while you might prefer
dealing with log_min_duration_statement instead. This is a minor issue
though, quite easy to solve with preprocessing.
1b) under heavy load, log_min_duration_statement = 0 (or log_statements =
'all') will lead to performance degradation or even downtime. Possible
solutions are: write to memory, or don't write at all but send over the
network.
1c) ideally, recoding just queries is not enough. To replay workload "as
is", we need to replay queries with known plans. There is no easy solution
to this problem in the Postgres ecosystem yet.

A couple of additional points regarding item 1b and 1c. In Postgres 12,
there is a cool new capability: sampling for query logging,
implemented by Adrien
Nayrat https://commitfest.postgresql.org/20/1691/ WIth this, it will be
possible to fully log, say, 5% of all transactions and use it for
replaying. Moreover, with auto_explain, it will be possible to have plans!
Open questions are: (a) how to determine, if N% is enough, and (b) how to
replay with specified plans. [If anyone is interested in working in this
direction – please reach out to me.]

2) Issues with replaying itself. I can highlight at least two problems here:
2a) pgreplay might be not enough for your workload, it doesn't scale
well. If interested, look at its analog written in Go,
https://github.com/gocardless/pgreplay-go, but this is quite a young
project.
2b) Postgres logs have millisecond precision (if you switched from %t to
%m in log_line_prefix), this might be not enough. There is a patch to
microsecond precision from David Fetter
https://www.postgresql.org/message-id/flat/20181023185050.GE6049%40fetter.org,
but that conversation hasn't yet led to commit.

Another approach you might be interested in -- workload simulation. This is
what we (Postgres.ai) now used in most times when building "lab"
environments for our clients. The idea is as follows:
- carefully analyze workload using pg_stat_statements (here, our
open-source tool called "postgres-checkup"
https://gitlab.com/postgres-ai/postgres-checkup might be helpful, see
reports in section K),
- take the most resource-consuming query groups (Top-N ordered by
total_time),
- create a set of files with statements with randomly filled parameters
(won't work for most cases, I discuss restrictions below),
- use pgbench, feed workload files to it, using multiple -f options, with
balancing (-f filename(at)XX, where XX is to be taked from
pg_statements_analysis, but this time, "calls" and their ratio in the whole
workload will be needed -- again, postgres-checkup can help here).
- run, analyze, compare behavior.

Restrictions of this approach are obvious:
- doesn't work well if most of your transactions have multiple statements,
- in many cases, randomization is hard (not obvious how to organize;
synthetic approach is far from real data distribution in storage and
workload; etc),
- the approach requires a significant amount of manual efforts.

However, the "workload simulation" approach is an extremely helpful
approach in many cases, helping with change management. It doesn't require
anything that might negatively affect your production workload, it utilizes
pgbench (or any other tool) which is reliable, has great features and
scales well.

You might be interested in looking at our tool that we built to conduct a
huge amount of DB experiments, Nancy CLI
https://gitlab.com/postgres-ai/nancy. It supports both "workload replay"
method (with pgreplay) and "workload simulation" (with pgbench). PM me if
you're interested in discussing details.

Thanks,
Nik

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jaime Casanova 2019-08-28 00:56:27 Re: PostgreSQL and Real Application Testing (RAT)
Previous Message Alexandra Wang 2019-08-27 23:59:30 Re: Zedstore - compressed in-core columnar storage