Insert comma based on max number of column

nicholas_ejn · December 10, 2008, 3:49pm

Hi,

I am new to unix shell shell scripting. I have a specific requirement where I need to append comma's based on the max number of column in the file.

Eg:

If my source file look something like this,

sengwa,china
tom,america,northamerica
smith,america
walter

My output file should should look like
sengwa,china,
tom,america,northamerica
smith,america,
walter,,

Please help me to get this resolved.

Regards,
Sengwa

vidyadhar85 · December 10, 2008, 4:10pm

try this!!

awk -F"," 'NF<3{print $1","$2","}NF>2{print $0}' filename

nicholas_ejn · December 10, 2008, 4:26pm

Hi,

Thanks for your reply. The script you provided is working for only maximum of 3 columns. But the number of column will vary in the source file.

Best regards,
Sengwa

vidyadhar85 · December 10, 2008, 4:47pm

provide some more input data

Lakris · December 10, 2008, 5:12pm

Hi Sengwa!

Interesting problem,

I cooked up something like this, it should work for even larger number of columns. It has to read through the file twice, although one could get away with saving it all in an array but since I don't know how many lines and columns there may be, it could be huge, I guess this will do:

lakris@ubuntu:~/projekt/scripts/maxcol$ cat file
sengwa,china
tom,america,northamerica
smith,america
walter

lakris@ubuntu:~/projekt/scripts/maxcol$ cat col.sh 
#!/bin/bash
maxval=0
while read line
do
newval=$(echo $line|tr , " "|wc -w)
[ $newval -gt $maxval ] && maxval=$newval
done < file
>newfile
while read line
do
arr=($(echo $line|tr , " "))
echo -n ${arr[0]} >> newfile
for ((i=1;i<$maxval;i++))
do
echo -n ",${arr[$i]}" >> newfile
done
echo >> newfile
unset arr
done < file

lakris@ubuntu:~/projekt/scripts/maxcol$ chmod +x col.sh 
lakris@ubuntu:~/projekt/scripts/maxcol$ ./col.sh


lakris@ubuntu:~/projekt/scripts/maxcol$ cat newfile 
sengwa,china,
tom,america,northamerica
smith,america,
walter,,

Will it work? I guess one could get away from using tr, echo and stuff and use some bash inline parameter modifications instead...

/Lakris

vgersh99 · December 10, 2008, 7:02pm

nawk -f nic.awk myFile myFile

nic.awk:

BEGIN {
  FS=OFS=","
}
NR==FNR{
  nf=(NF>nf) ? NF : nf
  next
}
{NF=nf; $1=$1;print}

vish_indian · December 11, 2008, 12:54pm

This much also works

 awk 'BEGIN {
  FS=OFS=","
}
NR==FNR{
  nf=(NF>nf) ? NF : nf
  next
}
{NF=nf; print}'

$1=$1 is confusing

vgersh99 · December 11, 2008, 1:01pm

no, it's not confusing - it forces the current record field reevaluation.
Without it, the code does not work - at least under Sun/Solaris.

vish_indian · December 11, 2008, 1:08pm

Works in Linux :), can't comment about Solaris

As per me it should have. What I understand from logic is that

 NR==FNR{
  nf=(NF>nf) ? NF : nf
  next
}

finds out the max value for NF for first input file.

When code is executed for second input file,

 {NF=nf; print}

it simply sets the NF value to max for each row and prints that row