Re: pgbench - allow backslash-continuations in custom scripts

From: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To: robertmhaas(at)gmail(dot)com
Cc: coelho(at)cri(dot)ensmp(dot)fr, michael(dot)paquier(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: pgbench - allow backslash-continuations in custom scripts
Date: 2016-02-18 11:54:58
Message-ID: 20160218.205458.29647346.horiguchi.kyotaro@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello,

At Tue, 16 Feb 2016 08:05:10 -0500, Robert Haas
<robertmhaas(at)gmail(dot)com> wrote in
<CA+TgmoavXzXVV_k-89SbgMKB-Eyp+RpSKu_0tPGqx_ceEk=kCQ(at)mail(dot)gmail(dot)com>
> On Mon, Feb 15, 2016 at 1:04 AM, Kyotaro HORIGUCHI
> <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
> > I'm sorry, but I didn't understood the 'submission notes' exactly
> > means. Is it precise descriptions in source comments? or commit
> > message of git-commit?
>
> Write a detailed email explaining each change that is part of the
> patch and why it is there. Attach the patch to that same email.

Sorry for the silly question. I'll try to describe this in
detail. I hope you are patient enough to read my clumsy (but a
bit long) text.

First, I rebased the previous patch set and merged three of
them. Now they are of three patches.

1. Making SQL parser part of psqlscan independent from psql.

Moved psql's baskslsh command stuff out of original psqlscan.l
and some psql stuff the parser directly looked are used via a
set of callback functions, which can be all NULL for usages
from other than psql.

2. Making pgbench to use the new psqlscan parser.

3. Changing the way to hold SQL/META commands from array to
linked list.

The #2 introduced linked list to store SQL multistatement but
immediately the caller moves the elements into an array. This
patch totally changes the way to linked list.

I'll explain in detail of each patch.

1. Make SQL parser part of psqlscan independent from psql.

The new file psqlscan_slashbody.l is the moved-out part of
psqlscan.l. The prefix of flex symbols are changed to "yys".

The psqlscan(_slash).c are container C source which does what is
previously done by mainloop.c. This makes other bin/ programs to
link psqlscan.o directly, without compilation.

1.1. The psqlscanbody.l

It is the SQL part of old psqlscan.l but the difference between
them is a bit bothersome to see. I attached the diff between them
as "psqlscanbody.l.diff" for convenience.

1.1.1. Switching between two lexers.

The SQL parser and the metacommand parser should be alternately
callable each other but yyparse() cannot parse from intermediate
position of the input text. So I provided functions to remake a
parser state using an input text beginning at just after the
position already parsed. psql_scan_switch_lexer() and
psql_scan_slash_command_switch_lexer() do that for psqlscan.l and
psqlscan_backslash.l respectively.

I haven't found no variable available to know how many characters
the parser has eaten so I decided to have a counter for the usage
as PsqlScanState.curpos and keep it pointing just after the last
letter already parsed. It is rather a common way to advance it
using YY_USER_ACTION and I did so, but the parser occasionally
steps back using yyless() when it reads some sequences. Hence
yyless() should decrement .curpos, but flex has no hook point for
the purpose. I defined my_yyless() (the name should need to be
changed) macro to do this.

1.1.2. Detaching psqlscan from psql stuff

psqlscan.l had depended on psql through pset.vars, pset.encoding,
pset.db and psql_erorr(). I have modified the parser to access
them via callback functions. psql_scan_setup() now has the new
fourth parameter PsqlScanCallbacks to receive them. The two
callbacks of them, standard_strings and error_out wouldn't need
detailed explanation, but enc_mblen and get_variable would need.

pset.encoding was used only to check if PQmblen is applicable
then to be given to PQmblen. So I provided the callback enc_mblen
which could be given if strings can be encoded using it. The
another one, get_variable() is an equivalent of a wrapper
function of GetVariable(). GetVariable() was called directly in
lexer definitions and indirectly via escape_variable() in
psqlscan.l but escape_variable() was accessing pset.db only in
order to check existence of an active connection. I could give it
via another callback, but I have moved out the part of the
function accessing it because it is accessed only in the
function.

Finally, I defined the callbacks in common.c.

1.2. The reason for the names psqlscanbody.l and psqlscan_slashbody.l.

psqlscan.l was finally included as psqlscan.c by mainloop.c. The
reason is postgresql_fe.h must be read before psqlscan.c on some
platform, according to the comment at the end of mainloop.c. But
it is an annoyance when using it from other bin/
programs. Therefore, I provided dedicated .c files to do so for
the two lexer .c files. In order to make the name of the file to
be linked from outside psql be psqlscan.o, I have renamed the *.l
files to *body.l.

1.3 The psqlscan_int.h file

As the new psqlscan.o is used from outside psql, psqlscan.h
should have only definitions needed to do so. psqlscan_int.h
defines some stuffs used by the lexers but not necessary to use
them.

1.4 Other files

Other files that are not mentioned so far, Makefile, common.h,
psqlscan_slash.h and startup.c would'nt be neccesary to be
explained.

2. Making pgbench to use the new psqlscan parser.

By the patch #2, pgbench.c gets mainly two major
modifications. Splitting of process_commands() and adding
backslash-continuation feature to the added function
process_backslash_commands().

2.1. process_commands() has been splitted into two functions

The function process_commands() has been splitted into
process_commands() and new function
process_backslash_commands(). The former has been made to use
psqlscan. In contrast to psql, pgbench first checks if the input
on focus is a backslash command or not in process_commands(),
then parses it using psqlscan if it was not a backslash
command. process_backslash_commands() is a cut out from the
original process_command() and it is modified to handle backslash
continuation. Both functions read multiple lines in a context so
the processing context is to be made by the caller
(i.e. process_file) and kept throughout all input lines.

2.2 backslash continuation in process_backslash_commands().

The loop over input lines in the old process_commands() is
refactored to handle SQL statements and backslash commands
separately. The most part of the new process_backslash_commands()
is almost the same with the corresponding part in the old
process_commands() except indentation. "git diff -b -patience"
gives the far-less-noisy differences so I attached it as the
"pgbench.c.patient.diff" for convenience.

3. Changing the way to hold SQL/META commands from array to
linked list.

The patch #2 adds a new member "next" to Command_t. It is
necessary to handle SQL multistatements. But the caller
process_file holds the commands in an array of Commant_t. This is
apparently not in undesirable form. Since the liked list seems
simpler for this usage (for me), I decided to unify them to the
linked list. Most of this patch is rather simple one by one
replacement.

I hope this submission note makes sense and this patch becomes
easy to read.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachment Content-Type Size
0001-Make-SQL-parser-part-of-psqlscan-independent-from-ps.patch text/x-patch 123.6 KB
0002-pgbench-uses-common-frontend-SQL-parser.patch text/x-patch 18.4 KB
0003-Change-the-way-to-hold-command-list.patch text/x-patch 13.8 KB
psqlscanbody.l.diff text/x-patch 27.8 KB
pgbench.c.patient.diff text/x-patch 9.4 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message 大塚憲司 2016-02-18 12:02:15 The number of bytes is stored in index_size of pgstatindex() ?
Previous Message Vladimir Sitnikov 2016-02-18 11:28:25 Re: JDBC behaviour