awk experts please help

mirwasim · January 16, 2013, 2:45pm

I have a log file of 60 MB with 20k records which contains data like below.

this contains some data so removed

2891358271000020, 2012-12-02 23:16:17 , 2012-12-02 23:16:17 ,
378015123, 2012-12-02 23:16:19 , 2012-12-02 23:16:19 ,

Corona688 · January 16, 2013, 3:03pm

I don't quite understand how you're generating this data from that input. The same date appears in multiple places, making it unclear which goes where. Not to mention there appears to be two records output for four records input, with some information repeated and some information not repeated in the output. How are they combined? What would happen if you had six of the same ID?

mirwasim · January 16, 2013, 3:11pm

This is first complete log(input flist and output flist).
input flist means when we receive info from upstream and output flist when we sent it to downstream

hi

so from this I am generating

2891358277113540, 2012-12-02 23:16:17 , 2012-12-02 23:16:17 ,

the script I have mentioned earlier is working fine but is taking time

Yoda · January 16, 2013, 3:36pm

awk '/PIN_FLD_UNIQUE_ID/ {
 uid=$NF;gsub(/"/,"",uid);
 print uid,dt,dt,per;
} /PIN_FLD_PERCENT/ {
 per=$NF;
} /UnknownProgramName/ {
 dt=sprintf("%s %s",$2,$3);
 gsub(/\..*/,"",dt);
}' OFS=, filename | awk '!a[$0]++'

mirwasim · January 16, 2013, 3:52pm

---------- Post updated at 02:19 AM ---------- Previous update was at 02:17 AM ----------

And

 #! /bin/sh or #! /bin/ksh

will work for it?

---------- Post updated at 02:22 AM ---------- Previous update was at 02:19 AM ----------

Below error I am getting

in

#! /bin/ksh
awk '/PIN_FLD_UNIQUE_ID/ {
 uid=$NF;gsub(/"/,"",uid);
 print uid,dt,dt,per;
} /PIN_FLD_PERCENT/ {
 per=$NF;
} /UnknownProgramName/ {
 dt=sprintf("%s %s",$2,$3);
 gsub(/\..*/,"",dt);
}' OFS=, $1 | awk '!a[$0]++'

Yoda · January 16, 2013, 3:55pm

It is an awk program, so you don't need a shebang: #!/bin/ksh

If you are on Solaris or SunOS use nawk or /usr/xpg4/bin/awk

For redirection:

nawk '/PIN_FLD_UNIQUE_ID/ {
 uid=$NF;gsub(/"/,"",uid);
 print uid,dt,dt,per;
} /PIN_FLD_PERCENT/ {
 per=$NF;
} /UnknownProgramName/ {
 dt=sprintf("%s %s",$2,$3);
 gsub(/\..*/,"",dt);
}' OFS=, filename | awk '!a[$0]++' > output_filename

mirwasim · January 16, 2013, 4:01pm

Yes I use sunOS
so tried nawk

please see error

nawk '/PIN_FLD_UNIQUE_ID/ {
 uid=$NF;gsub(/"/,"",uid);
 print uid,dt,dt,per;
} /PIN_FLD_PERCENT/ {
 per=$NF;
} /UnknownProgramName/ {
 dt=sprintf("%s %s",$2,$3);
 gsub(/\..*/,"",dt); 

}' OFS=, $1 | awk '!a[$0]++' > output_filename
pin@ssapp1025[sms]$ new.sh dm_sms.pinlog20121216231502
awk: syntax error near line 1
awk: bailing out near line 1

Yoda · January 16, 2013, 4:09pm

My bad, you have to modify both places:-

nawk '/PIN_FLD_UNIQUE_ID/ {
 uid=$NF;gsub(/"/,"",uid);
 print uid,dt,dt,per;
} /PIN_FLD_PERCENT/ {
 per=$NF;
} /UnknownProgramName/ {
 dt=sprintf("%s %s",$2,$3);
 gsub(/\..*/,"",dt);
}' OFS=, filename | nawk '!a[$0]++' > output_filename

I hope it helps.

mirwasim · January 16, 2013, 4:13pm

Yes it worked.

Yoda · January 16, 2013, 4:20pm

I'm not sure what you mean by 80%, Since you have the formatted output in a file, I believe this will be something that you can work & figure out yourself.

mirwasim · January 16, 2013, 4:23pm

1234, 2012-12-02 23:16:17 , 2012-12-02 23:16:17

Yoda · January 16, 2013, 4:30pm

It is pretty much straightforward, before printing the values you have to put a condition:

if ((per==80)||(per==100)||(per>100))
 print uid,dt,dt,per;

Change the condition as per your requirement.

mirwasim · January 16, 2013, 4:36pm

Thanks ...very much helpful

---------- Post updated at 03:06 AM ---------- Previous update was at 03:02 AM ----------

nawk '/PIN_FLD_UNIQUE_ID/ { 

 uid=$NF;gsub(/"/,"",uid);
 print uid,dt,dt,per;
} /PIN_FLD_PERCENT/ {
 per=$NF;
} /UnknownProgramName/ {
 dt=sprintf("%s %s",$2,$3);
 gsub(/\..*/,"",dt);

}' OFS=, $1 | nawk '!a[$0]++'
if ((per==80)||(per==100)||(per>100))
 print uid,dt,dt,per; > output_filename

Yoda · January 16, 2013, 4:40pm

Here:

nawk '/PIN_FLD_UNIQUE_ID/ {
 uid=$NF;gsub(/"/,"",uid);
 if ((per==80)||(per==100)||(per>100))
   print uid,dt,dt,per;
} /PIN_FLD_PERCENT/ {
 per=$NF;
} /UnknownProgramName/ {
 dt=sprintf("%s %s",$2,$3);
 gsub(/\..*/,"",dt);
}' OFS=, filename | nawk '!a[$0]++'

mirwasim · January 31, 2013, 4:04am

Bipin can you please explain what this awk program actually does..hard to understand as individual..

Yoda · January 31, 2013, 9:23am

nawk '/PIN_FLD_UNIQUE_ID/ {                     # If line contain pattern: PIN_FLD_UNIQUE_ID
        uid=$NF;                                # Set uid = last field value
        gsub(/"/,"",uid);                       # Remove double quotes from uid value
        if ((per==80)||(per==100)||(per>100))   # If per == 80 OR per == 100 OR per > 100
                print uid,dt,dt,per;            # print values of uid, dt & per
} /PIN_FLD_PERCENT/ {                           # If line contain pattern: PIN_FLD_PERCENT
        per=$NF;                                # Set per = last field value
} /UnknownProgramName/ {                        # If line contain pattern: UnknownProgramName
        dt=sprintf("%s %s",$2,$3);              # Set dt = 2nd & 3rd field value separated by space
        gsub(/\..*/,"",dt);                     # Remove string after period operator
}' OFS=, filename | nawk '!a[$0]++'             # Set output field separator to comma (,) Read filename. Finally remove duplicates if any from o/p