Bring values in the second column into single line (comma sep) for uniq value in the first column

I want to bring values in the second column into single line for uniq value in the first column.
My input

jvm01, Web 2.0 Feature Pack Library
jvm01, IBM WebSphere JAX-RS
jvm01, Custom01 Shared Library
jvm02, Web 2.0 Feature Pack Library
jvm02, IBM WebSphere JAX-RS
jvm03, Web 2.0 Feature Pack Library
jvm03, IBM WebSphere JAX-RS
jvm03, Custom03 Shared Library

Expecting this output

jvm01, Web 2.0 Feature Pack Library, IBM WebSphere JAX-RS, Custom01 Shared Library
jvm02, Web 2.0 Feature Pack Library, IBM WebSphere JAX-RS
jvm03, Web 2.0 Feature Pack Library, IBM WebSphere JAX-RS, Custom03 Shared Library

I could get result using below code,, but I do not like it. There got to be better way of doing this using awk/sed/perl one liner.

for jvm in $(cat file.txt | cut -d',' -f1 | uniq); do 
     libs=$(grep $jvm file.txt | cut -d',' -f2 | paste -s -d, -)
     echo "$jvm,$libs" 
done

jvm01, Web 2.0 Feature Pack Library, IBM WebSphere JAX-RS, Custom01 Shared Library
jvm02, Web 2.0 Feature Pack Library, IBM WebSphere JAX-RS
jvm03, Web 2.0 Feature Pack Library, IBM WebSphere JAX-RS, Custom03 Shared Library

Try:

awk '{for(i=3; i<=NF; i+=2) $i=x; gsub(FS FS,FS)}1' RS= FS=, OFS=, file

or

awk '!NF{print s; next} $1!=p{p=s=$1}{s=s FS $2} END{print s}' FS=, file
1 Like

Both options are not working.

# awk '{for(i=3; i<=NF; i+=2) $i=x; gsub(FS FS,FS)}1' RS= FS=, OFS=, file
jvm01, Web 2.0 Feature Pack Library, IBM WebSphere JAX-RS, Custom01 Shared Library, Web 2.0 Feature Pack Library, IBM WebSphere JAX-RS, Web 2.0 Feature Pack Library, IBM WebSphere JAX-RS, Custom03 Shared Library
# awk '!NF{print s; next} $1!=p{p=s=$1}{s=s FS $2} END{print s}' FS=, file
jvm03, Web 2.0 Feature Pack Library, IBM WebSphere JAX-RS, Custom03 Shared Library

Please convert your input file to unix format first.

tr -d '\r' < old_file > new_file

I verified if input file has "\r" using "od -c" I do not see carriage return characters. Still I followed your suggestion. I am getting same result.

# cat infile.txt
jvm01, Web 2.0 Feature Pack Library
jvm01, IBM WebSphere JAX-RS
jvm01, Custom01 Shared Library
jvm02, Web 2.0 Feature Pack Library
jvm02, IBM WebSphere JAX-RS
jvm03, Web 2.0 Feature Pack Library
jvm03, IBM WebSphere JAX-RS
jvm03, Custom03 Shared Library

# tr -d '\r' < infile.txt >infile1.txt; awk '!NF{print s; next} $1!=p{p=s=$1}{s=s FS $2} END{print s}' FS=, infile1.txt
jvm03, Web 2.0 Feature Pack Library, IBM WebSphere JAX-RS, Custom03 Shared Library

Your NEW input sample deviates seriously from the one in post#1! Scrutinizer's both proposals rely on empty lines separating records.

Hello kchinnam,

Could you please try following. If you are not bothered about the sequence of the output(same as Input_file) then following may help.

awk '{Q=$1} ($1 in A){$1=X} {A[Q]=A[Q]?A[Q] s1 $0:$0} END{for(i in A){print A}}' s1=","   Input_file

Output will be as follows.

jvm01, Web 2.0 Feature Pack Library, IBM WebSphere JAX-RS, Custom01 Shared Library
jvm02, Web 2.0 Feature Pack Library, IBM WebSphere JAX-RS
jvm03, Web 2.0 Feature Pack Library, IBM WebSphere JAX-RS, Custom03 Shared Library

NOTE: As RudiC mentioned already your Input_file is different. I had used latest Input_file in above.

Thanks,
R. Singh

Sorry for causing confusion with input.. I used extra empty line for readability.
R. Singh your command works great.. Can you explain how this part of awk works?
($1 in A){$1=X} {A[Q]=A[Q]?A[Q] s1 $0:$0}

Hello kchinnam,

Could you please try following.

awk '{Q=$1}     #### Assigning $1's value to a variable named Q here.
 ($1 in A)      #### Checking if $1(first field) is already present in array A, if yes then execute following statements.)
{$1=X}          #### Nullifying the values of $1 here.
{A[Q]=A[Q]      #### Here I am concatenating or assigning the values to array A whose index is variable Q's value(which is $1).
?               #### When a condition provided before =(equal sign) gets TURE so statements after ? should get executed.
A[Q] s1 $0      #### So if condition is TRUE(which is here A[Q] means if value of array A whose index is Q is already present) so I am concatenating the values of A[Q] s1 and current line($0) here s1=, which I had mentioned into last before calling Input_file.
:               #### If above mentioned condition gets FALSE(means no value of A[Q] is present yet) then do following.
$0}             #### Assign array A's value with index Q to $0(current line).
END{            #### Starting END block here.
for(i in A){    #### Starting a for loop which will traverse inside array A(to fetch it's all values).
print A}}    #### printing the values of array A whose index is i(a variable whose value is getting assigned while traversing in for loop).
' s1=","        #### Providing the value of variable named s1 to ,(As mentioned above too).
Input_file      #### Mentioning Input_file here.
 

Thanks,
R. Singh

1 Like

Try also

awk '$1 != LAST {printf "%s%s", DL, $1; LAST = $1; DL = RS} {printf ",%s", $2} END {print _}' FS=, file
jvm01, Web 2.0 Feature Pack Library, IBM WebSphere JAX-RS, Custom01 Shared Library
jvm02, Web 2.0 Feature Pack Library, IBM WebSphere JAX-RS
jvm03, Web 2.0 Feature Pack Library, IBM WebSphere JAX-RS, Custom03 Shared Library
1 Like

Hi Rudi your command is working great as well.
One benefit I see with R.Singh's is even if jvm names in first column are not in sorted order(next to each),, output is producing uniq jvms with correct second column values..
Thanks for great solutions.