Re: pgbench - extend initialization phase control

From: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
Cc: btendouan <btendouan(at)oss(dot)nttdata(dot)com>, "ibrar(dot)ahmad(at)gmail(dot)com:" <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: pgbench - extend initialization phase control
Date: 2019-11-05 15:56:51
Message-ID: CAHGQGwF2EOq2YZJyzDagvn7Pf5-3CxAgkdtiZKhtPpQ_F4-pGg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Oct 31, 2019 at 11:54 PM Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr> wrote:
>
>
> Hello Masao-san,
>
> > + snprintf(sql, sizeof(sql),
> > + "insert into pgbench_branches(bid,bbalance) "
> > + "select bid, 0 "
> > + "from generate_series(1, %d) as bid", scale);
> >
> > "scale" should be "nbranches * scale".
>
> Yep, even if nbranches is 1, it should be there.
>
> > + snprintf(sql, sizeof(sql),
> > + "insert into pgbench_accounts(aid,bid,abalance,filler) "
> > + "select aid, (aid - 1) / %d + 1, 0, '' "
> > + "from generate_series(1, %d) as aid", naccounts, scale * naccounts);
> >
> > Like client-side data generation, INT64_FORMAT should be used here
> > instead of %d?
>
> Indeed.
>
> > If large scale factor is specified, the query for generating pgbench_accounts
> > data can take a very long time. While that query is running, operators may be
> > likely to do Ctrl-C to cancel the data generation. In this case, IMO pgbench
> > should cancel the query, i.e., call PQcancel(). Otherwise, the query will keep
> > running to the end.
>
> Hmmm. Why not. Now the infra to do that seems to already exists twice,
> once in "src/bin/psql/common.c" and once in "src/bin/scripts/common.c".
>
> I cannot say I'm thrilled to replicate this once more. I think that the
> reasonable option is to share this in fe-utils and then to reuse it from
> there. However, ISTM that such a restructuring patch which not belong to
> this feature.

Understood. Ok, let's discuss this in other thread that you started.

> > - for (step = initialize_steps; *step != '\0'; step++)
> > + for (const char *step = initialize_steps; *step != '\0'; step++)
> >
> > Per PostgreSQL basic coding style,
>
> C99 (20 years ago) is new the norm, and this style is now allowed, there
> are over a hundred instances of these already. I tend to use that where
> appropriate.

Yes, I understood there are several places using such style.
But I still wonder why we should apply such change here.
If there is the reason why this change is necessary here,
I'm OK with that. But if not, basically I'd like to avoid the change.
Otherwise it may make the back-patch a bit harder
when we change the surrounding code.

> > - fprintf(stderr, "unrecognized initialization step \"%c\"\n",
> > + fprintf(stderr,
> > + "unrecognized initialization step \"%c\"\n"
> > + "Allowed step characters are: \"" ALL_INIT_STEPS "\".\n",
> > *step);
> > - fprintf(stderr, "allowed steps are: \"d\", \"t\", \"g\", \"v\",
> > \"p\", \"f\"\n");
> >
> > The original message seems better to me. So what about just appending "G"
> > into the above latter message? That is,
> > "allowed steps are: \"d\", \"t\", \"g\", \"G\", \"v\", \"p\", \"f\"\n"
>
> I needed this list in several places, so it makes sense to share the
> definition, and frankly the list of half a dozen comma-separated chars
> does not strike me as much better than just giving the allowed chars
> directly. So the simpler the better, from my point of view.

OK.

> > Isn't it better to explain a bit more what "client-side / server-side data
> > generation" is? For example, something like
>
> Ok.
>
> Attached v7 does most of the above, but the list of char message and the
> signal handling. The first one does not look really better to me, and the
> second one belongs to a restructuring patch that I'll try to submit.

Thanks for updating the patch!
Attached is the slightly updated version of the patch. Based on your
patch, I added the descriptions about logging of "g" and "G" steps into
the doc, and did some cosmetic changes. Barrying any objections,
I'm thinking to commit this patch.

While reviewing the patch, I found that current code allows space
character to be specified in -I. That is, checkInitSteps() accepts
space character. Why should we do this? Probably I understand
why runInitSteps() needs to accept space character (because "v"
in the specified string with -I is replaced with a space character
when --no-vacuum option is given). But I'm not sure why that's also
necessary in checkInitSteps(). Instead, we should treat a space
character as invalid in checkInitSteps()?

Regards,

--
Fujii Masao

Attachment Content-Type Size
pgbench-init-extended-7_fujii.patch application/octet-stream 11.2 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jehan-Guillaume de Rorthais 2019-11-05 16:29:18 Re: segmentation fault when cassert enabled
Previous Message Ibrar Ahmed 2019-11-05 15:50:54 Re: Do we have a CF manager for November?