Re: remove duplicated words in comments .. across lines

From: Antonin Houska <ah(at)cybertec(dot)at>
To: Justin Pryzby <pryzby(at)telsasoft(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: remove duplicated words in comments .. across lines
Date: 2018-09-09 09:11:21
Message-ID: 13012.1536484281@localhost
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Justin Pryzby <pryzby(at)telsasoft(dot)com> wrote:

> Resending to -hackers as I realized this isn't a documentation issue so not
> appropriate or apparently interesting to readers of -doc.
>
> Inspired by David's patch [0], find attached fixing words duplicated, across
> line boundaries.
>
> I should probably just call the algorithm proprietary, but if you really wanted to know, I've suffered again through sed's black/slashes.
>
> time find . -name '*.c' -o -name '*.h' |xargs sed -srn '/\/\*/!d; :l; /\*\//!{N; b l}; s/\n[[:space:]]*\*/\n/g; /(\<[[:alpha:]]{1,})\>\n[[:space:]]*\<\1\>/!d; s//>>&<</; p'
>
> Alternately:
> time for f in `find . -name '*.c' -o -name '*.h'`; do x=`<"$f" sed -rn '/\/\*/!d; :l; /\*\//!{N; b l}; s/\n[[:space:]]*\*/\n/g; /(\<[[:alpha:]]{1,})\>\n[[:space:]]*\<\1\>/!d; s//>>&<</; p'`; [ -n "$x" ] && echo "$f:" && echo "$x"; done |less

Alternatively you could have used awk as it can maintain variables across
lines. This is a script that I used to find those duplicates in a single file
(Just out of fun, I know that your findings have already been processed.):

BEGIN{prev_line_last_token = NULL}
{
if (NF > 1 && $1 == "*" && length(prev_line_last_token) > 0)
{
if ($2 == prev_line_last_token &&
# Characters used in ASCII charts are not duplicate words.
$2 != "|" && $2 != "}")
# Found a duplicate.
printf("%s:%s, duplicate token: %s\n", FILENAME, FNR, $2);
}

if (NF > 1 && ($1 == "*" || $1 == "/*"))
prev_line_last_token = $NF;
else
{
# Empty line or not a comment line. Start a new search.
prev_line_last_token = NULL;
}
}

--
Antonin Houska
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26, A-2700 Wiener Neustadt
Web: https://www.cybertec-postgresql.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Jinhua Luo 2018-09-09 14:16:15 Re: How to find local logical replication origin?
Previous Message John Naylor 2018-09-09 07:47:50 Re: pg_ugprade test failure on data set with column with default value with type bit/varbit