Adding Column Values Using Pattern Match

Hi All,

I have a file with data as below:

A,FILE1_MYFILE_20130309_1038,80,25.60
B,FILE1_MYFILE_20130309_1038,24290,18543.38
C,FILE1_dsc_dlk_MYFILE_20130309_1038,3,10.10
A,FILE2_MYFILE_20130310_1039,85,110.10
B,FILE2_MYFILE_20130310_1039,10,12.10
C,FILE2_err_dlk_MYFILE_20130310_1039,3,10.10

I am using following command to sum values of 3 column based on the value in second column.

for i in `cat OUTPUT_FILE|awk -F"," '{print $2}'|sort -u`;do grep $i OUTPUT_FILE|awk -F"," '{c+=$3}END{print $2".edr""|"c}';done

However the above command will definitely exclude third row as the value of second column in third row is not matching with that for other two rows. However, I want to include the third rwo as well. Hence I need to perform a pattern matching in the underlined part so that my output looks as below:

FILE1_MYFILE_20130309_1038,24373
FILE2_MYFILE_20130310_1039,98

How do I perform a pattern matching here.

Thanks and Regards
Angshuman

One way:

awk -F, '{x=$2;sub(/_.*/,"",x);if(!a[x])a[x]=$2;b[x]+=$3;}END{for (i in a)print a","b;}' file

Guru.

Hi Guru,

Thank you for your reply.

Why is it not working agaisnt following data set. Can you please explain a bit on how is the sub function working in awk.

SUSPREL,ICP_MED_DEL_SEM_20130309_1038,80,25.60
REL,ICP_MED_DEL_SEM_20130309_1038,24290,18543.38
ERROR_ALLRATE_DSC_DLK,ICP_dsc_dlk_MED_DEL_SEM_20130309_1038,3,10.10
SUSPREL,ICP_MED_DEL_SEM_20130309_1039,80,25.60
REL,ICP_MED_DEL_SEM_20130309_1039,24290,18543.38
ERROR_ALLRATE_DSC_DLK,ICP_dsc_dlk_MED_DEL_SEM_20130309_1039,3,10.10

Another point is that the data between ICP and MED can be of any leth.

---------- Post updated at 01:15 PM ---------- Previous update was at 12:30 PM ----------

Hi Rudic,

Thank you for your reply. However, the output that you have showed is not what I am expecting

Let me explain a bit more. If you check the value in column 2, you will see that the values are like FILE1_MYFILE_20130309_1038, FILE1_MYFILE_20130309_1038, FILE1_dsc_dlk_MYFILE_20130309_1038. These are actually from the same group. The only difference is that there are some additional values between FILE1 and MYFILE. Hence I want to add all the corresponsing column 3 values. If this is not possible in awk, I was thinking if I remove the values between FILE1 and MYFILE first using awk and then add the values in column 3. What is your input on that and how do I remove those values?

Thanks and Regards
Angshuman

A crude approach

awk -F"," '{a["ICP_"substr($2,index($2,"MED"))]+=$3} END{ for(i in a) print i,a}' OFS="," file

Hi Angshuman,

I figured that out and deleted my post, as it was irrelevant. In order to do what you want, we need to know what parts of the file names are to be retained, either by counting the subfields between separators (here: "_"), or by identifying static substrings (here: FILEn and MYFILE). If you can't give any hint on anchoring the to-be-deleted substring, things become difficult.

Hi Panyam,

Sorry your code does not provide the output that I am expecting. I get the following output:

MED_DEL_SEM_20130309_1038 24373
MED_DEL_SEM_20130309_1039 24373

and my expected result is

ICP_MED_DEL_SEM_20130309_1038,24373
ICP_MED_DEL_SEM_20130309_1039,24373

Check my earlier post now. I edited it.

$ awk '{sub (/ICP_.*MED/, "ICP_MED", $2); SUM[$2]+=$3}END {for (X in SUM) print X, SUM[X]}' FS="," OFS="," file
ICP_MED_DEL_SEM_20130309_1039,24373
ICP_MED_DEL_SEM_20130309_1038,24373

apologies for the same

Hi Avig,

Don't hijack others post. Please open a new request.

Regards,

2 Likes

avig,

In addition to what panyam said, do not post duplicates hoping for a faster response. Do yourself a favor and read the forum rules, because you have yet to post in a way that doesn't violate them: Simple rules of the UNIX.COM forums

I urge members to not reward thread hijacks with helpful responses.

Regards,
Alister

1 Like