Shell script

Hi,

I want to parse csv file s.csv

a,b,server,c,d,e,f,g,h,i,j,k,l,m,n
1,ss,ip-login-date-01,1,2,3,4,5,6,7,8,9,1,2,3
1,dd,cd-login-01,2,3,4,5,6,7,8,9,1,2,3,4,
1,ww,ip-login-date-02,1,3,4,5,6,7,8,9,1,2,3,4
1,ff,gf-login-date-01,5,6,7,8,9,1,2,3,4,5,6,7
1,gg,ip-login-date-03,6,7,8,9,1,2,3,4,5,6,7,8

based on a particular name of the column,here say server,not sure that the column will always be in the second position
,it might be any column in the csv file.In that if the server name is name
for eg

ip-login-date-01
ip-login-date-02
ip-login-date-03

a,b,server,c,d,e,f,g,h,i,j,k,l,m,n

1,ss,ip-login-date-01,1,2,3,4,5,6,7,8,9,1,2,3
1,ww,ip-login-date-02,1,3,4,5,6,7,8,9,1,2,3,4
1,gg,ip-login-date-03,6,7,8,9,1,2,3,4,5,6,7,8

i wanted the output as

a,b,server,c,d,e,f,g,h,i,j,k,l,m,n
1,ss-ww-gg,ip-login-date,8,12,15,18,12,15,18,21,15,9,12,15

first column as it is
second column appending all those names separated with -
server name without sequence number i.e 01,02,03
remaining columns - sum
eg:

ip-login-date-01,1,2
ip-login-date-02,1,3
ip-login-date-03,6,7

6+1+1 = 8 in 4th column
7+3+2 = 12 in 5th column ....... and redirect output to k.csv

So,

  • What have you tried so far?
  • What error, messages or output are you getting?
  • What Operating System and version are you running?
  • What are your preferred tools?

Most importantly, what have you tried so far?

It's better for you if you research, have a go and then ask for suggestions so that you learn and everyone else can see your thinking. That might help someone else out in future.

Regards,
Robin

Hi,

I posted in this site after working a lot on this file.But i forgot to post wat i have tried.

I tried matching the pattern separating with - with the following

grep "[[:alnum:]].\+-.\+[[:alnum:]].\+-.\+[[:alnum:]].\+-.*" test.csv > cat>p.csv

my input file

Time,SubnetMask,ScopeName,percentFree,TotalDynamic,TotalReserved,Leased,Avail,otherAvail,PendingAvail,InTransition,ReservedActive,Unavail,ActiveDeactivated,Deactivated
20131203170000,171.99.192.0/23,SRT-PKT-ActiveSubCPE-13,87,509,0,66,399,44,0,0,0,0,0,0,
20131203170000,10.18.32.0/20,MSKKBK13CM0-ActiveSubCM-01,35,3578,0,2300,1144,127,0,7,0,0,0,0,
20131203170000,110.169.128.0/23,SKW-09-ActiveSubCPE-01,0,504,0,504,0,0,0,0,0,0,0,0,
20131203170000,10.18.32.0/20,MSKKBK13CM0-BZ-ActiveSubCM-01,100,248,0,0,224,24,0,0,0,0,0,0,
20131203170000,10.18.32.0/20,Total,39,3826,0,2300,1368,151,0,7,0,0,0,0,
20131203170000,10.18.48.0/20,BNA-ActiveSubCM-02,81,4090,0,774,2982,331,0,3,0,0,0,0,
20131203170000,171.98.202.0/23,SKW-09-ActiveSubCPE-08,41,509,0,300,188,21,0,0,0,0,0,0,
20131203170000,10.18.48.0/20,Total,81,4090,0,774,2982,331,0,3,0,0,0,0,
20131203170000,10.18.64.0/20,SMSSBK01CM0-BZ-ActiveSubCM-01,100,504,0,0,454,50,0,0,0,0,0,0,
20131203170000,10.18.64.0/20,SMSSBK01CM0-ActiveSubCM-01,76,3576,0,834,2467,274,0,1,0,0,0,0,
20131203170000,171.99.150.0/23,SRT-PKT-ActiveSubCPE-07,0,509,0,509,0,0,0,0,0,0,0,0,
20131203170000,171.99.186.0/23,SKW-09-ActiveSubCPE-04,0,509,0,507,2,0,0,0,0,0,0,0,
20131203170000,171.99.172.0/23,Total,25,509,0,380,114,14,0,1,0,0,0,0,
20131203170000,171.99.174.0/23,PNC-ActiveSubCPE-05,47,509,0,269,216,24,0,0,0,0,0,0,
20131203170000,171.99.174.0/23,Total,47,509,0,269,216,24,0,0,0,0,0,0,

this matches the pattern [anything]-[anything]-[anything]-[someinteger]

and output those lines to p.csv

executed output

20131203170000,58.10.152.0/23,SRT-PKT-ActiveSubCPE-13,87,509,0,66,399,44,0,0,0,0,0,0,
20131203170000,171.99.150.0/23,SRT-PKT-ActiveSubCPE-07,0,509,0,509,0,0,0,0,0,0,0,0,

But i wanted the output in such a awy that

those two lines should be merged intio one single line with the fields from field number 4 added,field number 2 appended and the scope name as SRT-PKT-ActiveSubCPE

without succeeded by integer
i wonder why its not giving me the output with the lines matching pattern SKW-09-ActiveSubCPE

Required output

20131203170000,58.10.152.0/23+171.99.150.0/23,SRT-PKT-ActiveSubCPE,87,1018,0,575,399,44,0,0,0,0,0,0

Thanks in advance

Regards,
Preethy

$ cat file
Time,SubnetMask,ScopeName,percentFree,TotalDynamic,TotalReserved,Leased,Avail,otherAvail,PendingAvail,InTransition,ReservedActive,Unavail,ActiveDeactivated,Deactivated
20131203170000,171.99.192.0/23,SRT-PKT-ActiveSubCPE-13,87,509,0,66,399,44,0,0,0,0,0,0,
20131203170000,10.18.32.0/20,MSKKBK13CM0-ActiveSubCM-01,35,3578,0,2300,1144,127,0,7,0,0,0,0,
20131203170000,110.169.128.0/23,SKW-09-ActiveSubCPE-01,0,504,0,504,0,0,0,0,0,0,0,0,
20131203170000,10.18.32.0/20,MSKKBK13CM0-BZ-ActiveSubCM-01,100,248,0,0,224,24,0,0,0,0,0,0,
20131203170000,10.18.32.0/20,Total,39,3826,0,2300,1368,151,0,7,0,0,0,0,
20131203170000,10.18.48.0/20,BNA-ActiveSubCM-02,81,4090,0,774,2982,331,0,3,0,0,0,0,
20131203170000,171.98.202.0/23,SKW-09-ActiveSubCPE-08,41,509,0,300,188,21,0,0,0,0,0,0,
20131203170000,10.18.48.0/20,Total,81,4090,0,774,2982,331,0,3,0,0,0,0,
20131203170000,10.18.64.0/20,SMSSBK01CM0-BZ-ActiveSubCM-01,100,504,0,0,454,50,0,0,0,0,0,0,
20131203170000,10.18.64.0/20,SMSSBK01CM0-ActiveSubCM-01,76,3576,0,834,2467,274,0,1,0,0,0,0,
20131203170000,171.99.150.0/23,SRT-PKT-ActiveSubCPE-07,0,509,0,509,0,0,0,0,0,0,0,0,
20131203170000,171.99.186.0/23,SKW-09-ActiveSubCPE-04,0,509,0,507,2,0,0,0,0,0,0,0,
20131203170000,171.99.172.0/23,Total,25,509,0,380,114,14,0,1,0,0,0,0,
20131203170000,171.99.174.0/23,PNC-ActiveSubCPE-05,47,509,0,269,216,24,0,0,0,0,0,0,
20131203170000,171.99.174.0/23,Total,47,509,0,269,216,24,0,0,0,0,0,0,
20131203170000,171.99.192.0/23,SRT-PKT-ActiveSubCPE-13.5,87,509,0,66,399,44,0,0,0,0,0,0,
20131203170000,171.99.192.0/23,SRT-PKT-ActiveSubCPE-13.5 ,87,509,0,66,399,44,0,0,0,0,0,0,
awk '

  $0 ~ pattern"-[0-9]+,.*"{
				if(!n){
					n=split($0,A,/,/)
				      }
				 else {
					for(i=2;i<=9;i++)
					A = i == 2 ? A[2] "+" $2 : i >3 ? (A+=$i) : A 
				      }				
			  }
  		       END{
				for(i=1;i<=n;i++){
							if(i==3)gsub(/-[0-9]+/,x,A[3])
							s = s ? s FS A : A		
						 }
				print s
     			  }
      ' FS=, pattern="SRT-PKT-ActiveSubCPE" file
$ bash tester
20131203170000,171.99.192.0/23+171.99.150.0/23,SRT-PKT-ActiveSubCPE,87,1018,0,575,399,44,0,0,0,0,0,0,

---------- Post updated at 08:27 PM ---------- Previous update was at 08:21 PM ----------

I could not find 58.10.152.0/23 in your datafile

Seems good

Who is "preethy" and who is "nivl" ? Double account ?

Thanks Akshay Hegde for the fix. I was struggling a lot on this. It was really helpful

But the pattern SRT-PKT-ActiveSubCP is not fixed,means there may be anything between [ ]-[ ]-[ ]-[digit]
for eg i should parse the field scope name,if scope name is like this

SMSSBK01CM0-ActiveSubCM-01
SMSSBK01CM0-ActiveSubCM-02
SMSSBK01CM0-ActiveSubCM-03

SKW-09-ActiveSubCPE-01
SKW-09-ActiveSubCPE-02
SKW-09-ActiveSubCPE-03

i should obtain the string before last "-" and search for all possibilities of the string and merge all those lines into single line with the second field appended,field number 2 appended and the scope name excluding the part from last "-".what all scope names are there in input file i should get them in output file with the above said modifications

Thanks,
Preethy

---------- Post updated at 09:11 AM ---------- Previous update was at 09:10 AM ----------

sorry we both are working in same office,we both have accounts in unix.com.internet is not working fine in my system so i gave reply from her account,now its working fine.so i replied from my account

Okay,

Following just gives you matched pattern, you need to have gawk ,

$ awk --re-interval '/([[:alnum:]]+-){3}([[:digit:]]+,).*/' file
20131203170000,171.99.192.0/23,SRT-PKT-ActiveSubCPE-13,87,509,0,66,399,44,0,0,0,0,0,0,
20131203170000,110.169.128.0/23,SKW-09-ActiveSubCPE-01,0,504,0,504,0,0,0,0,0,0,0,0,
20131203170000,10.18.32.0/20,MSKKBK13CM0-BZ-ActiveSubCM-01,100,248,0,0,224,24,0,0,0,0,0,0,
20131203170000,171.98.202.0/23,SKW-09-ActiveSubCPE-08,41,509,0,300,188,21,0,0,0,0,0,0,
20131203170000,10.18.64.0/20,SMSSBK01CM0-BZ-ActiveSubCM-01,100,504,0,0,454,50,0,0,0,0,0,0,
20131203170000,171.99.150.0/23,SRT-PKT-ActiveSubCPE-07,0,509,0,509,0,0,0,0,0,0,0,0,
20131203170000,171.99.186.0/23,SKW-09-ActiveSubCPE-04,0,509,0,507,2,0,0,0,0,0,0,0,

Following script does computation which you looking for

awk --re-interval '

/([[:alnum:]]+-){3}([[:digit:]]+,).*/{

					  z = $3
					  gsub(/-[0-9]+/,x,z)
			 if(!(z in S)){
					 
					  S[z] = $0
				      }
				 else {
					p = ""
					for(i=1;i<=n=split(S[z],A,/,/);i++)
					  {
					   A = i == 2 ? A[2] "+" $2 : i >3 ? (A+=$i) : A
					   p = p ? p FS A : A
					  }
					S[z] = p  
				      }				
			  }
  		       END{
			     for(k in S){
					 s = ""
				for(i=1;i<=n=split(S[k],A,/,/);i++)
						 {
							if(i==3)gsub(/-[0-9]+,/,x,A[3])
							s = s ? s FS A : A		
						 }
				print s
					}
     			  }
      ' FS=,  file
$ bash tester
20131203170000,10.18.32.0/20,MSKKBK13CM0-BZ-ActiveSubCM-01,100,248,0,0,224,24,0,0,0,0,0,0,
20131203170000,10.18.64.0/20,SMSSBK01CM0-BZ-ActiveSubCM-01,100,504,0,0,454,50,0,0,0,0,0,0,
20131203170000,171.99.192.0/23+171.99.150.0/23,SRT-PKT-ActiveSubCPE-13,87,1018,0,575,399,44,0,0,0,0,0,0,0
20131203170000,110.169.128.0/23+171.98.202.0/23+171.99.186.0/23,SKW-09-ActiveSubCPE-01,41,1522,0,1311,190,21,0,0,0,0,0,0,0

Note : If you have double account kindly let us know now only, otherwise you will be banned from the forum, our administrators will easily come to know about fake accounts.

1 Like

HI Akshay Hegde ,

             my requirement is i want to get the scope name in output file without succeeded by digit.i.e if scope name is bla-bla-bla-[digit] i should get in output file as bla-bla-bla.

I tried like this

awk -F, '{sub(/-[0-9]*$/,x,$3);print $3}' test.csv |cat>p.csv
awk -F, '{t=$0;getline<"test.csv";$3=t}1' p.csv > sample.csv
mv sample.csv test.csv

what i got is
executed output

20131203170000 171.99.192.0/23 SRT-PKT-ActiveSubCPE 87 509 0 66 399 44 0 0 0 0 0 0 
20131203170000 110.169.128.0/23 SKW-09-ActiveSubCPE 0 504 0 504 0 0 0 0 0 0 0 0 
20131203170000 10.18.32.0/20 MSKKBK13CM0-BZ-ActiveSubCM 100 248 0 0 224 24 0 0 0 0 0 0 
20131203170000 171.98.202.0/23 SKW-09-ActiveSubCPE 41 509 0 300 188 21 0 0 0 0 0 0 
20131203170000 10.18.64.0/20 SMSSBK01CM0-BZ-ActiveSubCM 100 504 0 0 454 50 0 0 0 0 0 0 
20131203170000 171.99.150.0/23 SRT-PKT-ActiveSubCPE 0 509 0 509 0 0 0 0 0 0 0 0 
20131203170000 171.99.186.0/23 SKW-09-ActiveSubCPE 0 509 0 507 2 0 0 0 0 0 0 0

and then passsing this through your code.But your code seems to work only for the file delimited with comma.

Any help is highly appreciated.

Thanks & Regards,
Preethy

Why don't you show expected result for given input sample ? we don't want your executed output, thread crossed 2 pages, with no conclusion.

Hi Akshay Hegde,
inputfile

Time,SubnetMask,ScopeName,percentFree,TotalDynamic,TotalReserved,Leased,Avail,otherAvail,PendingAvail,InTransition,ReservedActive,Unavail,ActiveDeactivated,Deactivated
20131203170000,171.99.192.0/23,SRT-PKT-ActiveSubCPE-13,87,509,0,66,399,44,0,0,0,0,0,0,
20131203170000,110.169.128.0/23,SKW-09-ActiveSubCPE-01,0,504,0,504,0,0,0,0,0,0,0,0,
20131203170000,10.18.16.0/20,Total,76,3824,0,882,2647,293,0,2,0,0,0,0,
20131203170000,10.18.32.0/20,MSKKBK13CM0-BZ-ActiveSubCM-01,100,248,0,0,224,24,0,0,0,0,0,0,
20131203170000,10.18.48.0/20,BNA-ActiveSubCM-02,81,4090,0,774,2982,331,0,3,0,0,0,0,
20131203170000,171.98.202.0/23,SKW-09-ActiveSubCPE-08,41,509,0,300,188,21,0,0,0,0,0,0,
20131203170000,10.18.48.0/20,Total,81,4090,0,774,2982,331,0,3,0,0,0,0,
20131203170000,10.18.64.0/20,SMSSBK01CM0-BZ-ActiveSubCM-01,100,504,0,0,454,50,0,0,0,0,0,0,
20131203170000,10.18.80.0/20,SRWWBK09CM0-ActiveSubCM-01,80,3576,0,704,2585,287,0,0,0,0,0,0,
20131203170000,171.99.150.0/23,SRT-PKT-ActiveSubCPE-07,0,509,0,509,0,0,0,0,0,0,0,0,
20131203170000,10.18.80.0/20,Total,81,3824,0,704,2809,311,0,0,0,0,0,0,
20131203170000,171.99.186.0/23,SKW-09-ActiveSubCPE-04,0,509,0,507,2,0,0,0,0,0,0,0,

expected

Time,SubnetMask,ScopeName,percentFree,TotalDynamic,TotalReserved,Leased,Avail,otherAvail,PendingAvail,InTransition,ReservedActive,Unavail,ActiveDeactivated,Deactivated
20131203170000,171.99.192.0/23+171.99.150.0/23,SRT-PKT-ActiveSubCPE,87,1018,0,575,399,44,0,0,0,0,0,0,
20131203170000,110.169.128.0/23+171.98.202.0/23+171.99.186.0/23,SKW-09-ActiveSubCPE,41,1522,0,1311,190,21,0,0,0,0,0,0
20131203170000,10.18.32.0/20,MSKKBK13CM0-BZ-ActiveSubCM,100,248,0,0,224,24,0,0,0,0,0,0,
20131203170000,10.18.48.0/20,BNA-ActiveSubCM,81,4090,0,774,2982,331,0,3,0,0,0,0,
20131203170000,10.18.64.0/20,SMSSBK01CM0-BZ-ActiveSubCM,100,504,0,0,454,50,0,0,0,0,0,0,
20131203170000,10.18.80.0/20,SRWWBK09CM0-ActiveSubCM,80,3576,0,704,2585,287,0,0,0,0,0,0,

lines containing the word Total have to be removed

Thanks in advance

Ok, use following code

awk  '
		 NR == 1 { print ; next } 
                 !/Total/{
					  z = $3
					  gsub(/-[0-9]+$/,x,z)
			  if(!(z in S)){
					  found[++m] = z
					  S[z] = $0
				       }
				  else {
					 p = ""
					 for( i=1; i<=split(S[z],A,/,/); i++ )
					    {
					       # A = i == 2 ? A[2] "+" $2 : i >3 ? (A+=$i) : A    update
                                               A = i == 2 ? A[2] "+" $2 : i >3 && i<=9 ? (A+=$i) : A
					       p = p ? p FS A : A
					    }
					 S[z] = p  
				       }				
			  }
  		       END{
			     for(q = 1; q <= m; q++)
			     {	
			       for(k in S)
				     {
					 if(k == found[q])
					   {
				  	     s = ""
				  	     for( i=1; i<=split(S[k],A,/,/); i++ )
						 {
							if(i==3)A[3]=k
							s = s ? s FS A : A		
						 }
					     print s
					   }
				     }
			     }
     			  }
      ' FS=,  file

Hi Akshay Hegde,
This is the output i got by executing your code.

Time,SubnetMask,ScopeName,percentFree,TotalDynamic,TotalReserved,Leased,Avail,otherAvail,PendingAvail,InTransition,ReservedActive,Unavail,ActiveDeactivated,Deactivated
20131203170000,171.99.192.0/23+171.99.150.0/23,SRT-PKT-ActiveSubCPE,87,1018,0,575,399,44,0,0,0,0,0,0,0
20131203170000,110.169.128.0/23+171.98.202.0/23+171.99.186.0/23,SKW-ActiveSubCPE,41,1522,0,1311,190,21,0,0,0,0,0,0,0
20131203170000,10.18.32.0/20,MSKKBK13CM0-BZ-ActiveSubCM,100,248,0,0,224,24,0,0,0,0,0,0,
20131203170000,10.18.48.0/20,BNA-ActiveSubCM,81,4090,0,774,2982,331,0,3,0,0,0,0,
20131203170000,10.18.64.0/20,SMSSBK01CM0-BZ-ActiveSubCM,100,504,0,0,454,50,0,0,0,0,0,0,
20131203170000,10.18.80.0/20,SRWWBK09CM0-ActiveSubCM,80,3576,0,704,2585,287,0,0,0,0,0,0,

but may i know why the first two lines are showing 16 fields instead of 15,with extra 0 at the end and the scope name
in second line is SKW-ActiveSubCPE instead of SKW-09-ActiveSubCPE

Please use codetags !

There are actually 16 fields, it was my fault

Replace this line in code A = i == 2 ? A[2] "+" $2 : i >3 ? (A+=$i) : A

with A = i == 2 ? A[2] "+" $2 : i >3 && i<=9 ? (A+=$i) : A

Hi Akshay Hegde,
Why the scope name in second line is SKW-ActiveSubCPE instead of SKW-09-ActiveSubCPE

I just updated #12 , Thank you.

Hi Akshay Hegde,

              I really appreciate what you've done. Thanks a bunch.

Thanks & Regards,
Preethy

---------- Post updated at 06:01 AM ---------- Previous update was at 05:09 AM ----------

Hi Akshay Hegde,
can u pleasa explain what is the functionality of gsub,S in if(!(z in S)),found[++m] = z and the block after end.Googling doesnt help me.

Thanks in advance,
Preethy.

Please read man awk first.

Thanks,
Akshay

Hi Akshay Hegde,

                   I have gone through man awk.But  i really cant understand gsub\(/-[0-9]\+$/,x,z\) what this is returning and what is S in if\(!\(z in S\)\).

Thanks ,
Preethy

Hah, If you do not mind, before go to sleep, I will give a short explaination of your question.
You know that, we got an array S now. (In awk, array is associated array.the element of it not only can be S[1], S[2] but also S["aa"],S["bb"]).
Now assume that array S has two elements:
S["aa-00-10-abc"]=100
S["bb-20-30-def"]=200
When z is a string "aa-00-10-abc" or "bb-20-30-def".
We say z in S, because z is a index of some element of array S.
Ok, now the gsub(/pattern/,replace_to,source)
awk find the 'pattern' in 'source' and replace 'pattern' space to 'replace_to'.
gsub return how many times it does a successful find-to-replace,
and the replace result still store in the 'source'.
For example:
z="bb-20-30-def"
x="-99"
gsub(/-[0-9]+$/,x,z) return 2 because there are two -[0-9]+ patterns in z (-20 and -30)and after gsub, z will be "bb-99-99-def"
Hope can help you~