awk experts please help

I have a log file of 60 MB with 20k records which contains data like below.

this contains some data so removed

2891358271000020, 2012-12-02 23:16:17 , 2012-12-02 23:16:17 ,
378015123, 2012-12-02 23:16:19 , 2012-12-02 23:16:19 , 

I don't quite understand how you're generating this data from that input. The same date appears in multiple places, making it unclear which goes where. Not to mention there appears to be two records output for four records input, with some information repeated and some information not repeated in the output. How are they combined? What would happen if you had six of the same ID?

This is first complete log(input flist and output flist).
input flist means when we receive info from upstream and output flist when we sent it to downstream

hi

so from this I am generating

2891358277113540, 2012-12-02 23:16:17 , 2012-12-02 23:16:17 , 

the script I have mentioned earlier is working fine but is taking time

awk '/PIN_FLD_UNIQUE_ID/ {
 uid=$NF;gsub(/"/,"",uid);
 print uid,dt,dt,per;
} /PIN_FLD_PERCENT/ {
 per=$NF;
} /UnknownProgramName/ {
 dt=sprintf("%s %s",$2,$3);
 gsub(/\..*/,"",dt);
}' OFS=, filename | awk '!a[$0]++'

---------- Post updated at 02:19 AM ---------- Previous update was at 02:17 AM ----------

And

 #! /bin/sh or #! /bin/ksh 

will work for it?

---------- Post updated at 02:22 AM ---------- Previous update was at 02:19 AM ----------

Below error I am getting

in


#! /bin/ksh
awk '/PIN_FLD_UNIQUE_ID/ {
 uid=$NF;gsub(/"/,"",uid);
 print uid,dt,dt,per;
} /PIN_FLD_PERCENT/ {
 per=$NF;
} /UnknownProgramName/ {
 dt=sprintf("%s %s",$2,$3);
 gsub(/\..*/,"",dt);
}' OFS=, $1 | awk '!a[$0]++'

It is an awk program, so you don't need a shebang: #!/bin/ksh

If you are on Solaris or SunOS use nawk or /usr/xpg4/bin/awk

For redirection:

nawk '/PIN_FLD_UNIQUE_ID/ {
 uid=$NF;gsub(/"/,"",uid);
 print uid,dt,dt,per;
} /PIN_FLD_PERCENT/ {
 per=$NF;
} /UnknownProgramName/ {
 dt=sprintf("%s %s",$2,$3);
 gsub(/\..*/,"",dt);
}' OFS=, filename | awk '!a[$0]++' > output_filename

Yes I use sunOS
so tried nawk

please see error

nawk '/PIN_FLD_UNIQUE_ID/ {
 uid=$NF;gsub(/"/,"",uid);
 print uid,dt,dt,per;
} /PIN_FLD_PERCENT/ {
 per=$NF;
} /UnknownProgramName/ {
 dt=sprintf("%s %s",$2,$3);
 gsub(/\..*/,"",dt); 

}' OFS=, $1 | awk '!a[$0]++' > output_filename
pin@ssapp1025[sms]$ new.sh dm_sms.pinlog20121216231502
awk: syntax error near line 1
awk: bailing out near line 1

My bad, you have to modify both places:-

nawk '/PIN_FLD_UNIQUE_ID/ {
 uid=$NF;gsub(/"/,"",uid);
 print uid,dt,dt,per;
} /PIN_FLD_PERCENT/ {
 per=$NF;
} /UnknownProgramName/ {
 dt=sprintf("%s %s",$2,$3);
 gsub(/\..*/,"",dt);
}' OFS=, filename | nawk '!a[$0]++' > output_filename

I hope it helps.

1 Like

Yes it worked.

I'm not sure what you mean by 80%, Since you have the formatted output in a file, I believe this will be something that you can work & figure out yourself.

1234, 2012-12-02 23:16:17 , 2012-12-02 23:16:17 

It is pretty much straightforward, before printing the values you have to put a condition:

if ((per==80)||(per==100)||(per>100))
 print uid,dt,dt,per;

Change the condition as per your requirement.

Thanks ...very much helpful

---------- Post updated at 03:06 AM ---------- Previous update was at 03:02 AM ----------

nawk '/PIN_FLD_UNIQUE_ID/ { 

 uid=$NF;gsub(/"/,"",uid);
 print uid,dt,dt,per;
} /PIN_FLD_PERCENT/ {
 per=$NF;
} /UnknownProgramName/ {
 dt=sprintf("%s %s",$2,$3);
 gsub(/\..*/,"",dt);

}' OFS=, $1 | nawk '!a[$0]++'
if ((per==80)||(per==100)||(per>100))
 print uid,dt,dt,per; > output_filename


Here:

nawk '/PIN_FLD_UNIQUE_ID/ {
 uid=$NF;gsub(/"/,"",uid);
 if ((per==80)||(per==100)||(per>100))
   print uid,dt,dt,per;
} /PIN_FLD_PERCENT/ {
 per=$NF;
} /UnknownProgramName/ {
 dt=sprintf("%s %s",$2,$3);
 gsub(/\..*/,"",dt);
}' OFS=, filename | nawk '!a[$0]++'
1 Like

Bipin can you please explain what this awk program actually does..hard to understand as individual..

nawk '/PIN_FLD_UNIQUE_ID/ {                     # If line contain pattern: PIN_FLD_UNIQUE_ID
        uid=$NF;                                # Set uid = last field value
        gsub(/"/,"",uid);                       # Remove double quotes from uid value
        if ((per==80)||(per==100)||(per>100))   # If per == 80 OR per == 100 OR per > 100
                print uid,dt,dt,per;            # print values of uid, dt & per
} /PIN_FLD_PERCENT/ {                           # If line contain pattern: PIN_FLD_PERCENT
        per=$NF;                                # Set per = last field value
} /UnknownProgramName/ {                        # If line contain pattern: UnknownProgramName
        dt=sprintf("%s %s",$2,$3);              # Set dt = 2nd & 3rd field value separated by space
        gsub(/\..*/,"",dt);                     # Remove string after period operator
}' OFS=, filename | nawk '!a[$0]++'             # Set output field separator to comma (,) Read filename. Finally remove duplicates if any from o/p