awk command to manipulate csv file in UNIX

Hi, I am new to awk and unix programming and trying to manipulate a csv file.

My current csv file looks like this:

col1,col2,col3,col4,col5,col4,col5,col6,col7,col8
223,V,c,2,4,f,r,,y,z
223,V,c,3,2,f,r,,y,z
223,V,c,1,4,f,r,,y,z
223,V,c,4,3,f,r,,y,z
227,V,c,3,1,f,r,,y,z
227,V,c,4,1,f,r,,y,z
228,V,c,5,5,f,r,,y,z
228,V,c,2,4,f,r,,y,z
228,V,c,1,7,f,r,,y,z
228,V,c,1,8,f,r,,y,z
228,V,c,1,1,f,r,,y,z
228,V,c,1,4,f,r,,y,z
228,V,c,1,5,f,r,,y,z

Desired output csv:

col1,Sum of col4,Sum of col5
223,10,13
227,7,2
228,12,34

I tried a bunch of awk commands but nothing seemed to work :frowning:

Is there any awk solution for this?

Thank you in advance.

awk -F, '
        {
                A_4[$1] += $4
                A_5[$1] += $5
        }
        END {
                for ( k in A_4 )
                        print k, A_4[k], A_5[k]
        }
' OFS=, file

its not working correctly, in the middle of the file it prints the header in one of the rows, and the first line is ,0,0

my csv file has headers so i added NR==1

still no luck

Yoda's suggestion looks good except for the way it handles the input and output header lines. You might want to try this slight modification:

awk -F, '
NR == 1 {
	OFS = ","
	print $1, "Sum of " $4, "Sum of " $5
	next
}
{	s4[$1] += $4
	s5[$1] += $5
}
END {	for(i in s4)
		print i, s4, s5
}' file.csv

Note that the output produced by this script could (except for the header line) have the output lines in random order. If you need the output to be sorted by increasing numeric order of the 1st field, or if you want the output to be in the same order as the input (assuming all lines with the same 1st field value are grouped together as in your sample input), the script would need changed to consistently produce the output order you want.

If you want to try this on a Solaris/SunOS system, change awk to /usr/xpg4/bin/awk or nawk .

To retain the col1's contents' order, try

awk '
NR == 1 {print "col1,Sum of col4,Sum of col5"
         next
        }
$1 != LAST &&
NR > 2  {print LAST, SUM4, SUM5
         SUM4 = SUM5 = 0
        }
        {SUM4 += $4
         SUM5 += $5
         LAST  = $1
        }
END     {print LAST, SUM4, SUM5
        }
' FS=, OFS=, file
col1,Sum of col4,Sum of col5
223,10,13
227,7,2
228,12,34

A few bits of Perl.

perl -nalF, -e '                                  # set Perl to make tokens out of lines
BEGIN{$, = $" = ","}                              # output separators
$c[$o++] = $F[0] unless $col1{$F[0]};             # create a queue order
$col1{$F[0]}->[0] += $F[3];                       # add to column 4
$col1{$F[0]}->[1] += $F[4];                       # add to column 5
END{
    $col1{$c[0]}=["Sum of col4", "Sum of col5"];  # set header
    for(@c){print $_,@{$col1{$_}}}                # display tally
}' file.csv
col1,Sum of col4,Sum of col5
223,10,13
227,7,2
228,12,34

Thank you so much! Both RudiC and Aia's solutions work, howeverwith the awk solution, the second line prints

,0,0

and with the perl solution the last line prints

,0,0

Can someone explain what might be causing this?

Difficult to believe. The proposal was tested with your sample in post#1 before posted with the given result, and a repeated test now yields the same.
So - the difference must lie in

  • the awk version - which is yours?
  • the input file / file structure
  • correctness of copy and paste the program
    Please post the conditions that you ran the program in.

---------- Post updated at 19:07 ---------- Previous update was at 19:07 ----------

One suspicion: an empty line in (at the end of?) the input file. Result with an empty line:

col1,Sum of col4,Sum of col5
223,10,13
227,7,2
228,12,34
,0,0

That's the result of your data having empty line(s).