Working with strings (awk, sed, scripting, etc...)

Hi evrybody
For those who are bored I suggest exercise for tail :slight_smile:
There is "csv" string:

A,B,C,D,E,G

Desired output:

| (A) A | (A,B) B | (A,B,C) C | (A,B,C,D) D | (A,B,C,D,E) E | G

There are no whitespace characters at the beginning and end of the line.

Hi, thanks for the puzzle :slight_smile:

awk '{s=x; for(i=1; i<=NF-1; i++) {s=s (s?FS:"(") $i; $i=s ") " $i " " }}{sub(" $",x); print x,$0}' FS=, OFS='| ' file
1 Like

Well, not sure if this is the most elegant solution:

awk '
    {TMP = $NF
     for (i=NF; i>1; i--)    {sub ("," $i, _)
                              TMP = (i==2?"(":_) $0 (i==2?")":_) " " $(i-1) " | " TMP
                             }
     gsub (/([^ ],[^ ]*)+/, "(&)", TMP)
     print "| "  TMP
    }
' FS=,  file
 | (A) A | (A,B) B | (A,B,C) C | (A,B,C,D) D | (A,B,C,D,E) E | G

EDIT: or, a bit more straightforward,

awk -F, '{for (i=1; i<NF; i++) {TMP = TMP DL $i; DL = FS; printf "| (%s) %s ", TMP , $i}; print "| " $NF}' file
 | (A) A | (A,B) B | (A,B,C) C | (A,B,C,D) D | (A,B,C,D,E) E | G

EDIT: revisiting the first proposal, it can be simplified somewhat:

awk '
        {TMP = $NF
         for (i=NF; i>1; i--)   {sub ("," $i, _)
                                 TMP = "(" $0 ") " $(i-1) " | " TMP
                                }
         print "| "  TMP
        }
' FS=,  file
1 Like

@Scrutinizer, @RudiC Thank you for the good examples. I think the last one is optimal. I also have variants for parsing the line is not in length but in width on AWK with using RS="," and not a complicated version on SED. I will share my efforts after a while.

One might also try:

awk -F, '
NF {	for(i = 1; i < NF; i++) {
		printf("| (%s", $1)
		for(j = 2; j <= i; j++)
			printf("%s%s", FS, $j)
		printf(") %s ", $i)
	}
	print "| " $NF
}' file

This uses a little more verbose approach to the problem, but produces the same output as Scrutinizer's suggestion except for input lines containing no fields. My code won't give any output for empty input lines; Scrutinizer's code will produce an output line containing a vertical bar, a space, and a newline character for an empty input line.

If you want the output his code produces in that case, my code will do that if you remove the first occurrence of NF in my code. If you don't want the output his code produces n that case, his code will get rid of that line if you change the {sub in his code to NF{sub .

1 Like

Hello to all. Thanks again for participating.
After the post @Don_Cragun I added explanations to each example
Not very elegant but it just works

#!/bin/bash
:<<SPRAVKA
It works only with a single line.
it is possible with spaces.
SPRAVKA

while read -d, P; do
        T=$T$d$P
        echo -n "$t| ($T) $P"
        d=,
        t=" "
done < file
echo -n " | "
grep -o '.$' file

It's simple and don't even need to use "hold spase".
This works with each line separately.

sed -rn 's/.$/ | &/; :1;s/^(\S*)(.),/\1 | (\1\2) \2/;t1;s/^ //p' file

Here in my opinion there is elegance but I doubt the effectiveness of the work
This works with all strings as if one ends with a comma excluding the last

awk 'RT {T = T (T?RS:"(") $1; printf "| " T ") " $1 FS; next} {print "| "$1}' RS=, file

Nice use of RT , however RT is GNU awk only.

Here is another option using RS=, which should work with any POSIX awk:

awk 'p{printf "| (%s) %s ",s p,p; s=s p RS } {p=$1} END{print "| " p}' RS=, file

Of course, these options only work with single lines, otherwise we would need to add NR%c conditions

1 Like

@Scrutinizer, then so, but it has have to forget about the multi-line also

awk '!(length-1) {T = T (T?RS:"(") $1; printf "| " T ") " $1 FS; next} {print "| "$1}' RS=, <<<"A,B,C,D,E,G"