Re: pgbench - extend initialization phase control

From: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
Cc: btendouan <btendouan(at)oss(dot)nttdata(dot)com>, "ibrar(dot)ahmad(at)gmail(dot)com:" <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: pgbench - extend initialization phase control
Date: 2019-10-30 10:08:58
Message-ID: CAHGQGwHWEyTXxZh46qgFY8a2bDF_EYeUdp3+_Hy=qLZSzwVPKg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Oct 28, 2019 at 10:36 PM Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr> wrote:
>
>
> Hello Masao-san,
>
> >> Maybe. If you cannot check, you can only guess. Probably it should be
> >> small, but the current version does not allow to check whether it is so.
> >
> > Could you elaborate what you actually want to measure the performance
> > impact by adding explicit begin and commit? Currently pgbench -i issues
> > the following queries. The data generation part is already executed within
> > single transaction. You want to execute not only data generation but also
> > drop/creation of tables within single transaction, and measure how much
> > performance impact happens? I'm sure that would be negligible.
> > Or you want to execute data generate in multiple transactions, i.e.,
> > execute each statement for data generation (e.g., one INSERT) in single
> > transaction, and then want to measure the performance impact?
> > But the patch doesn't enable us to do such data generation yet.
>
> Indeed, you cannot do this precise thing, but you can do others.
>
> > So I'm thinking that it's maybe better to commit the addtion of "G" option
> > first separately. And then we can discuss how much "(" and ")" options
> > are useful later.
>
> Attached patch v6 only provides G - server side data generation.

Thanks for the patch!

+ snprintf(sql, sizeof(sql),
+ "insert into pgbench_branches(bid,bbalance) "
+ "select bid, 0 "
+ "from generate_series(1, %d) as bid", scale);

"scale" should be "nbranches * scale".

+ snprintf(sql, sizeof(sql),
+ "insert into pgbench_accounts(aid,bid,abalance,filler) "
+ "select aid, (aid - 1) / %d + 1, 0, '' "
+ "from generate_series(1, %d) as aid", naccounts, scale * naccounts);

Like client-side data generation, INT64_FORMAT should be used here
instead of %d?

If large scale factor is specified, the query for generating pgbench_accounts
data can take a very long time. While that query is running, operators may be
likely to do Ctrl-C to cancel the data generation. In this case, IMO pgbench
should cancel the query, i.e., call PQcancel(). Otherwise, the query will keep
running to the end.

- for (step = initialize_steps; *step != '\0'; step++)
+ for (const char *step = initialize_steps; *step != '\0'; step++)

Per PostgreSQL basic coding style, ISTM that "const char *step"
should be declared separately from "for" loop, like the original.

- fprintf(stderr, "unrecognized initialization step \"%c\"\n",
+ fprintf(stderr,
+ "unrecognized initialization step \"%c\"\n"
+ "Allowed step characters are: \"" ALL_INIT_STEPS "\".\n",
*step);
- fprintf(stderr, "allowed steps are: \"d\", \"t\", \"g\", \"v\",
\"p\", \"f\"\n");

The original message seems better to me. So what about just appending "G"
into the above latter message? That is,
"allowed steps are: \"d\", \"t\", \"g\", \"G\", \"v\", \"p\", \"f\"\n"

- <term><literal>g</literal> (Generate data)</term>
+ <term><literal>g</literal> or <literal>G</literal>
(Generate data, client or server side)</term>

Isn't it better to explain a bit more what "client-side / server-side data
generation" is? For example, something like

When "g" (client-side data generation) is specified, data is generated
in pgbench client and sent to the server. When "G" (server-side data
generation) is specified, only queries are sent from pgbench client
and then data is generated in the server. If the network bandwidth is low
between pgbench and the server, using "G" might make the data
generation faster.

Regards,

--
Fujii Masao

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2019-10-30 10:19:30 Re: Remove one use of IDENT_USERNAME_MAX
Previous Message Amit Langote 2019-10-30 10:03:41 Re: v12.0: ERROR: could not find pathkey item to sort