Working with strings (awk, sed, scripting, etc...)

nezabudka · May 25, 2019, 7:34am

Hi evrybody
For those who are bored I suggest exercise for tail
There is "csv" string:

A,B,C,D,E,G

Desired output:

| (A) A | (A,B) B | (A,B,C) C | (A,B,C,D) D | (A,B,C,D,E) E | G

There are no whitespace characters at the beginning and end of the line.

Scrutinizer · May 25, 2019, 7:49am

Hi, thanks for the puzzle

awk '{s=x; for(i=1; i<=NF-1; i++) {s=s (s?FS:"(") $i; $i=s ") " $i " " }}{sub(" $",x); print x,$0}' FS=, OFS='| ' file

RudiC · May 25, 2019, 2:45pm

Well, not sure if this is the most elegant solution:

awk '
    {TMP = $NF
     for (i=NF; i>1; i--)    {sub ("," $i, _)
                              TMP = (i==2?"(":_) $0 (i==2?")":_) " " $(i-1) " | " TMP
                             }
     gsub (/([^ ],[^ ]*)+/, "(&)", TMP)
     print "| "  TMP
    }
' FS=,  file
 | (A) A | (A,B) B | (A,B,C) C | (A,B,C,D) D | (A,B,C,D,E) E | G

EDIT: or, a bit more straightforward,

awk -F, '{for (i=1; i<NF; i++) {TMP = TMP DL $i; DL = FS; printf "| (%s) %s ", TMP , $i}; print "| " $NF}' file
 | (A) A | (A,B) B | (A,B,C) C | (A,B,C,D) D | (A,B,C,D,E) E | G

EDIT: revisiting the first proposal, it can be simplified somewhat:

awk '
        {TMP = $NF
         for (i=NF; i>1; i--)   {sub ("," $i, _)
                                 TMP = "(" $0 ") " $(i-1) " | " TMP
                                }
         print "| "  TMP
        }
' FS=,  file

nezabudka · May 25, 2019, 3:22pm

@Scrutinizer, @RudiC Thank you for the good examples. I think the last one is optimal. I also have variants for parsing the line is not in length but in width on AWK with using RS="," and not a complicated version on SED. I will share my efforts after a while.

Don_Cragun · May 25, 2019, 3:33pm

One might also try:

awk -F, '
NF {	for(i = 1; i < NF; i++) {
		printf("| (%s", $1)
		for(j = 2; j <= i; j++)
			printf("%s%s", FS, $j)
		printf(") %s ", $i)
	}
	print "| " $NF
}' file

This uses a little more verbose approach to the problem, but produces the same output as Scrutinizer's suggestion except for input lines containing no fields. My code won't give any output for empty input lines; Scrutinizer's code will produce an output line containing a vertical bar, a space, and a newline character for an empty input line.

If you want the output his code produces in that case, my code will do that if you remove the first occurrence of NF in my code. If you don't want the output his code produces n that case, his code will get rid of that line if you change the {sub in his code to NF{sub .

nezabudka · May 26, 2019, 4:58am

Hello to all. Thanks again for participating.
After the post @Don_Cragun I added explanations to each example
Not very elegant but it just works

#!/bin/bash
:<<SPRAVKA
It works only with a single line.
it is possible with spaces.
SPRAVKA

while read -d, P; do
        T=$T$d$P
        echo -n "$t| ($T) $P"
        d=,
        t=" "
done < file
echo -n " | "
grep -o '.$' file

It's simple and don't even need to use "hold spase".
This works with each line separately.

sed -rn 's/.$/ | &/; :1;s/^(\S*)(.),/\1 | (\1\2) \2/;t1;s/^ //p' file

Here in my opinion there is elegance but I doubt the effectiveness of the work
This works with all strings as if one ends with a comma excluding the last

awk 'RT {T = T (T?RS:"(") $1; printf "| " T ") " $1 FS; next} {print "| "$1}' RS=, file

Scrutinizer · May 26, 2019, 6:26am

nezabudka:

[..]Here in my opinion there is elegance but I doubt the effectiveness of the work
This works with all strings as if one ends with a comma excluding the last
awk 'RT {T = T (T?RS:"(") $1; printf "| " T ") " $1 FS; next} {print "| "$1}' RS=, file

Nice use of RT , however RT is GNU awk only.

Here is another option using RS=, which should work with any POSIX awk:

awk 'p{printf "| (%s) %s ",s p,p; s=s p RS } {p=$1} END{print "| " p}' RS=, file

Of course, these options only work with single lines, otherwise we would need to add NR%c conditions

nezabudka · May 26, 2019, 7:29am

@Scrutinizer, then so, but it has have to forget about the multi-line also

awk '!(length-1) {T = T (T?RS:"(") $1; printf "| " T ") " $1 FS; next} {print "| "$1}' RS=, <<<"A,B,C,D,E,G"