Bring values in the second column into single line (comma sep) for uniq value in the first column

kchinnam · August 25, 2016, 5:11pm

I want to bring values in the second column into single line for uniq value in the first column.
My input

jvm01, Web 2.0 Feature Pack Library
jvm01, IBM WebSphere JAX-RS
jvm01, Custom01 Shared Library
jvm02, Web 2.0 Feature Pack Library
jvm02, IBM WebSphere JAX-RS
jvm03, Web 2.0 Feature Pack Library
jvm03, IBM WebSphere JAX-RS
jvm03, Custom03 Shared Library

Expecting this output

jvm01, Web 2.0 Feature Pack Library, IBM WebSphere JAX-RS, Custom01 Shared Library
jvm02, Web 2.0 Feature Pack Library, IBM WebSphere JAX-RS
jvm03, Web 2.0 Feature Pack Library, IBM WebSphere JAX-RS, Custom03 Shared Library

I could get result using below code,, but I do not like it. There got to be better way of doing this using awk/sed/perl one liner.

for jvm in $(cat file.txt | cut -d',' -f1 | uniq); do 
     libs=$(grep $jvm file.txt | cut -d',' -f2 | paste -s -d, -)
     echo "$jvm,$libs" 
done

jvm01, Web 2.0 Feature Pack Library, IBM WebSphere JAX-RS, Custom01 Shared Library
jvm02, Web 2.0 Feature Pack Library, IBM WebSphere JAX-RS
jvm03, Web 2.0 Feature Pack Library, IBM WebSphere JAX-RS, Custom03 Shared Library

Scrutinizer · August 25, 2016, 6:48pm

Try:

awk '{for(i=3; i<=NF; i+=2) $i=x; gsub(FS FS,FS)}1' RS= FS=, OFS=, file

or

awk '!NF{print s; next} $1!=p{p=s=$1}{s=s FS $2} END{print s}' FS=, file

kchinnam · August 25, 2016, 7:18pm

Both options are not working.

# awk '{for(i=3; i<=NF; i+=2) $i=x; gsub(FS FS,FS)}1' RS= FS=, OFS=, file
jvm01, Web 2.0 Feature Pack Library, IBM WebSphere JAX-RS, Custom01 Shared Library, Web 2.0 Feature Pack Library, IBM WebSphere JAX-RS, Web 2.0 Feature Pack Library, IBM WebSphere JAX-RS, Custom03 Shared Library

# awk '!NF{print s; next} $1!=p{p=s=$1}{s=s FS $2} END{print s}' FS=, file
jvm03, Web 2.0 Feature Pack Library, IBM WebSphere JAX-RS, Custom03 Shared Library

Scrutinizer · August 26, 2016, 1:01am

Please convert your input file to unix format first.

tr -d '\r' < old_file > new_file

kchinnam · August 26, 2016, 9:55am

I verified if input file has "\r" using "od -c" I do not see carriage return characters. Still I followed your suggestion. I am getting same result.

# cat infile.txt
jvm01, Web 2.0 Feature Pack Library
jvm01, IBM WebSphere JAX-RS
jvm01, Custom01 Shared Library
jvm02, Web 2.0 Feature Pack Library
jvm02, IBM WebSphere JAX-RS
jvm03, Web 2.0 Feature Pack Library
jvm03, IBM WebSphere JAX-RS
jvm03, Custom03 Shared Library

# tr -d '\r' < infile.txt >infile1.txt; awk '!NF{print s; next} $1!=p{p=s=$1}{s=s FS $2} END{print s}' FS=, infile1.txt
jvm03, Web 2.0 Feature Pack Library, IBM WebSphere JAX-RS, Custom03 Shared Library

RudiC · August 26, 2016, 10:11am

Your NEW input sample deviates seriously from the one in post#1! Scrutinizer's both proposals rely on empty lines separating records.

RavinderSingh13 · August 26, 2016, 10:16am

Hello kchinnam,

Could you please try following. If you are not bothered about the sequence of the output(same as Input_file) then following may help.

awk '{Q=$1} ($1 in A){$1=X} {A[Q]=A[Q]?A[Q] s1 $0:$0} END{for(i in A){print A}}' s1=","   Input_file

Output will be as follows.

jvm01, Web 2.0 Feature Pack Library, IBM WebSphere JAX-RS, Custom01 Shared Library
jvm02, Web 2.0 Feature Pack Library, IBM WebSphere JAX-RS
jvm03, Web 2.0 Feature Pack Library, IBM WebSphere JAX-RS, Custom03 Shared Library

NOTE: As RudiC mentioned already your Input_file is different. I had used latest Input_file in above.

Thanks,
R. Singh

kchinnam · August 29, 2016, 10:04am

Sorry for causing confusion with input.. I used extra empty line for readability.
R. Singh your command works great.. Can you explain how this part of awk works?
($1 in A){$1=X} {A[Q]=A[Q]?A[Q] s1 $0:$0}

RavinderSingh13 · August 29, 2016, 10:19am

Hello kchinnam,

Could you please try following.

awk '{Q=$1}     #### Assigning $1's value to a variable named Q here.
 ($1 in A)      #### Checking if $1(first field) is already present in array A, if yes then execute following statements.)
{$1=X}          #### Nullifying the values of $1 here.
{A[Q]=A[Q]      #### Here I am concatenating or assigning the values to array A whose index is variable Q's value(which is $1).
?               #### When a condition provided before =(equal sign) gets TURE so statements after ? should get executed.
A[Q] s1 $0      #### So if condition is TRUE(which is here A[Q] means if value of array A whose index is Q is already present) so I am concatenating the values of A[Q] s1 and current line($0) here s1=, which I had mentioned into last before calling Input_file.
:               #### If above mentioned condition gets FALSE(means no value of A[Q] is present yet) then do following.
$0}             #### Assign array A's value with index Q to $0(current line).
END{            #### Starting END block here.
for(i in A){    #### Starting a for loop which will traverse inside array A(to fetch it's all values).
print A}}    #### printing the values of array A whose index is i(a variable whose value is getting assigned while traversing in for loop).
' s1=","        #### Providing the value of variable named s1 to ,(As mentioned above too).
Input_file      #### Mentioning Input_file here.

Thanks,
R. Singh

RudiC · August 29, 2016, 10:23am

Try also

awk '$1 != LAST {printf "%s%s", DL, $1; LAST = $1; DL = RS} {printf ",%s", $2} END {print _}' FS=, file
jvm01, Web 2.0 Feature Pack Library, IBM WebSphere JAX-RS, Custom01 Shared Library
jvm02, Web 2.0 Feature Pack Library, IBM WebSphere JAX-RS
jvm03, Web 2.0 Feature Pack Library, IBM WebSphere JAX-RS, Custom03 Shared Library

kchinnam · August 29, 2016, 12:56pm

Hi Rudi your command is working great as well.
One benefit I see with R.Singh's is even if jvm names in first column are not in sorted order(next to each),, output is producing uniq jvms with correct second column values..
Thanks for great solutions.