Re: Ltree syntax improvement

From: Nikolay Shaplov <dhyan(at)nataraj(dot)su>
To: pgsql-hackers(at)lists(dot)postgresql(dot)org
Cc: Dmitry Belyavsky <beldmit(at)gmail(dot)com>
Subject: Re: Ltree syntax improvement
Date: 2019-02-20 09:28:32
Message-ID: 1659194.9VRYtGT3Pi@x200m
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

В письме от вторник, 29 января 2019 г. 20:43:07 MSK пользователь Dmitry
Belyavsky написал:
> Dear all,
>
> Please find attached the patch extending the sets of symbols allowed in
> ltree labels. The patch introduces 2 variants of escaping symbols, via
> backslashing separate symbols and via quoting the labels in whole for
> ltree, lquery and ltxtquery datatypes.

Sorry I still did not do full revision, I've stumbled up on other things in
the code, which, as I think should be fixed before final checking through. (I
guess that code do what is expected, since it passes tests and so on, it needs
recheck, but I'd like to do this full recheck only once)

So my doubts about this code is that a lot parts of code is just a copy-paste
of the same code that was written right above. And it can be rewritten in a
way that the same lines (or parts of lines) is repeated only once...
This should make code more readable, and make code more similar to the code of
postgres core, I've seen.

Here I will speak about lquery_in function from ltree_io.c.

1. May be it is better here to switch from else if (state ==
LQPRS_[something]) to switch (stat) { case LQPRS_[something]: }
because because here the main case is to determine what is the current state
of your state machine, do something with it, and then go to the next step.

Now after you've done what should be done in this step, you should make sure
that no other code is executed writing more ifs... If you use switch/case you
will be able to say break wherever you like, it will save you some ifs.

Theoretical example

(cl is for character length fc is for fist char)

if (cl==1 && fc =='1') {do_case1();}
else if (cl==1 && fc =='2') {do_case2();}
else if (cl==1 && fc =='3') {do_case3();}
else {do_otherwise();}

If it is wrapped in switch/case it will be like

if (cl==1)
{
if (fc =='1') {do_case1(); break;}
if (fc =='2') {do_case2(); break;}
if (fc =='3') {do_case3(); break;}
}
/*if nothing is found */
do_otherwise();

This for example will allow you to get rid of multiply charlen == 1 checks in

else if ((lptr->flag & LVAR_QUOTEDPART) == 0)
{
if (charlen == 1 && t_iseq(ptr, '@'))
{
if (lptr->start == ptr)
UNCHAR;
lptr->flag |= LVAR_INCASE;
curqlevel->flag |= LVAR_INCASE;
}
else if (charlen == 1 && t_iseq(ptr, '*'))
{
if (lptr->start == ptr)
UNCHAR;
lptr->flag |= LVAR_ANYEND;
curqlevel->flag |= LVAR_ANYEND;
}
else if (charlen == 1 && t_iseq(ptr, '%'))
{
if (lptr->start == ptr)
UNCHAR;
lptr->flag |= LVAR_SUBLEXEME;
curqlevel->flag |= LVAR_SUBLEXEME;
}
else if (charlen == 1 && t_iseq(ptr, '|'))
{
real_nodeitem_len(lptr, ptr, escaped_count,
tail_space_bytes, tail_space_symbols);

if (state == LQPRS_WAITDELIMSTRICT)
adjust_quoted_nodeitem(lptr);

check_level_length(lptr, pos);
state = LQPRS_WAITVAR;
}
else if (charlen == 1 && t_iseq(ptr, '.'))
{
real_nodeitem_len(lptr, ptr, escaped_count,
tail_space_bytes, tail_space_symbols);

if (state == LQPRS_WAITDELIMSTRICT)
adjust_quoted_nodeitem(lptr);


check_level_length(lptr, pos);

state = LQPRS_WAITLEVEL;
curqlevel = NEXTLEV(curqlevel);
}
else if (charlen == 1 && t_iseq(ptr, '\\'))
{
if (state == LQPRS_WAITDELIMSTRICT)
UNCHAR;
state = LQPRS_WAITESCAPED;
}
else
{
if (charlen == 1 && strchr("!{}", *ptr))
UNCHAR;
if (state == LQPRS_WAITDELIMSTRICT)
{
if (t_isspace(ptr))
{
ptr += charlen;
pos++;
tail_space_bytes += charlen;
tail_space_symbols = 1;
continue;
}

UNCHAR;
}
if (lptr->flag & ~LVAR_QUOTEDPART)
UNCHAR;
}
}

There are a lot of them, I would suggest to reduce there number to one check
in the right place.

Similar thing is about "GETVAR(curqlevel) = lptr = (nodeitem *)
palloc0(sizeof(nodeitem) * numOR);" in this part of code.

if (charlen == 1)
{
if (t_iseq(ptr, '!'))
{
GETVAR(curqlevel) = lptr = (nodeitem *)
palloc0(sizeof(nodeitem) * numOR);
lptr->start = ptr + 1;
state = LQPRS_WAITDELIM;
curqlevel->numvar = 1;
curqlevel->flag |= LQL_NOT;
hasnot = true;
}
else if (t_iseq(ptr, '*'))
state = LQPRS_WAITOPEN;
else if (t_iseq(ptr, '\\'))
{
GETVAR(curqlevel) = lptr = (nodeitem *)
palloc0(sizeof(nodeitem) * numOR);
lptr->start = ptr;
curqlevel->numvar = 1;
state = LQPRS_WAITESCAPED;
}
else if (strchr(".|@%{}", *ptr))
{
UNCHAR;
}
else
{
GETVAR(curqlevel) = lptr = (nodeitem *)
palloc0(sizeof(nodeitem) * numOR);
lptr->start = ptr;
state = LQPRS_WAITDELIM;
curqlevel->numvar = 1;
if (t_iseq(ptr, '"'))
{
lptr->flag |= LVAR_QUOTEDPART;
}
}
}
else
{
GETVAR(curqlevel) = lptr = (nodeitem *)
palloc0(sizeof(nodeitem) * numOR);
lptr->start = ptr;
state = LQPRS_WAITDELIM;
curqlevel->numvar = 1;
}

It is repeated almost is every branch of this if-subtree... Can it be
rewritten the way it is repeated only once?

And so on.

So my request is to reduce multiply repetition of the same code...

PS. One other thing that I've been thinking over but did not came to final
conclusion...

It is often quite often case

if (t_iseq(ptr, 'a'){same_code(); code_for_a();}
else if (t_iseq(ptr, 'b'){same_code(); code_for_b();}
else if (t_iseq(ptr, 'c'){same_code(); code_for_c);}
else something_else();

I've been thinking if it is possible to move this to

const unsigned char c=TOUCHAR(ptr);
switch(c)
{
case 'a':
case 'b':
case 'c':
same_code();
if (c=='a') {code_for_a();}
if (c=='b') {code_for_b();}
if (c=='c') {code_for_c();}
break;
default:
something_else();
}

I did not still get, how valid is replacement of t_iseq() with TOUCHAR() +
"==". It is quite equivalent now, but I do not know how the core code is going
to evolve, so can't get final understanding. I even tried to discuss it with
Teodor (you've seen it) but still did come to any conclusion.

So for now status of this idea is "my considerations about ways to deduplicate
the code".

So this is all essential things I can say about this patch as it is now. May
be somebody can add something more...

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2019-02-20 10:15:32 Re: [PATCH v20] GSSAPI encryption support
Previous Message Magnus Hagander 2019-02-20 08:40:18 Re: pg_basebackup ignores the existing data directory permissions