text filtering

thibodc · June 30, 2012, 11:00am

INPUT FILE:

Date: 10-JUN-12 12:00:00
B 0: 00 00 00 00 10 00 16 28
B 120: 00 00 00 39 53 32 86 29
 
Date: 10-JUN-12 12:00:10
B 0: 00 00 00 00 10 01 11 22
B 120: 00 00 00 29 23 32 16 29
Date: 10-JUN-12 12:00:20
B 0: 00 00 00 00 10 02 17 29
B 120: 00 00 35 51 42 66 14
Date: 10-JUN-12 12:00:30
B 0: 00 00 00 00 10 03 61 42
B 120: 00 00 00 44 33 52 21 52
Date: 10-JUN-12 12:00:40
B 0: 00 00 00 00 10 04 11 22
B 120: 00 00 12 87 10 01 13 42
Date: 10-JUN-12 12:00:50
B 0: 00 00 00 00 10 05 15 24
B 120: 00 00 12 87 10 01 13 42
Date: 10-JUN-12 12:01:00
B 0: 00 00 00 00 10 06 11 22
B 120: 00 00 12 87 10 01 13 42

Then repeats (the field after 10 on the the B 0: line increments from 00 to 06) with new times and new data (except 10 will always be in the same field on the B 0: line).

What I would like the output to be (sometimes data is missing so checks will have to be done)
I would like to find the B 0: line that contains 10 00 and then print the line above it if it contains Date: then if that checks out print the B 0: line that contains 10 00. Right after these 2 lines are printed I would like to find the B 0: line that contains 10 04 then print the line above it if contains Date:, then print the B 0: line that contains 10 04 that was just found, then print the line right below the B 0: line that contains 10 04 if it contains B 120:. Right after these 2 lines are printed I would like to find the B 0: line that contains 10 06 then print the line above it if contains Date:, then print the B 0: line that contains 10 06 that was just found, then print the line right below the B 0: line that contains 10 06 if it contains B 120:. I would like this done for the entire file (going to the next set of data). Sorry this is fairly confusing.

Below is the output I would like.

Date: 10-JUN-12 12:00:00
B 0: 00 00 00 00 10 00 16 28
Date: 10-JUN-12 12:00:40
B 0: 00 00 00 00 10 04 11 22
B 120: 00 00 12 87 10 01 13 42
Date: 10-JUN-12 12:01:00
B 0: 00 00 00 00 10 06 11 22
B 120: 00 00 12 87 10 01 13 42

If the input file was larger the next set of data might look like this:

Date: 10-JUN-12 12:01:10
B 0: 00 00 00 00 10 00 46 78
Date: 10-JUN-12 12:00:50
B 0: 00 00 00 00 10 04 61 82
B 120: 00 00 14 77 10 01 19 02
Date: 10-JUN-12 12:02:10
B 0: 00 00 00 00 10 06 77 55
B 120: 00 00 82 87 70 01 13 42

FYI: I am using solaris 10 and don't have any GNU products installed. I've tried using egrep without success.....I'm thinking awk would be better...but I'm not sure. If you know how to solve this problem...your help would be appreciated.

elixir_sinari · June 30, 2012, 11:38am

Would you show a sample input file which has data missing? What I want to know is whether missing data means missing lines or some missing values in the lines. As far I can make out from your description of the problem, whole lines could be missing and the only lines always present are the B 0: lines. But, please confirm.

Check if this meets your requirement:

awk '/^B 0:/ && $7=="10" && $8=="04"{if(p ~ /^Date:/){print p;print $0;p=$0;getline;if($0 ~ /^B 120:/) print}
}    /^B 0:/ && $7=="10" && $8=="06"{if(p ~ /^Date:/){print p;print $0;p=$0;getline;if($0 ~ /^B 120:/) print}
}    /^B 0:/ && $7=="10" && $8=="00" {if(p ~ /^Date:/){print p;print $0}
}   {p=$0}' inputfile

thibodc · June 30, 2012, 12:41pm

The Date: line will be the only line that's there always including every field. B 0: and B 120: lines could be missing (the entire line not individual fields). However, I want the search to key on B 0: lines and find the other lines if they are there. So, if the B 0: line isn't there I don't want to print anything. Thanks for the help will the above meet this criteria? Why is line 3 of your code last shouldn't it be first...since that should be the first thing printed? Sorry I'm new to awk do I need the } in front of each line of code?

elixir_sinari · June 30, 2012, 2:04pm

In that case, the following should work (not tested fully though):

awk '/^Date:/{times++} /^Date:/ && times==8 {for(i=1;a;i++){
split(a,f)
if (a ~ /^B 0:/ && f[7]=="10" && f[8]=="00") {if(a[i-1] ~ /^Date:/){print a[i-1];print a}}
if (a ~ /^B 0:/ && f[7]=="10" && f[8]=="04") {if(a[i-1] ~ /^Date:/){print a[i-1];print a;j=i+1;if(a[j] ~ /^B 120:/) print a[j]}}
if (a ~ /^B 0:/ && f[7]=="10" && f[8]=="06") {if(a[i-1] ~ /^Date:/){print a[i-1];print a;j=i+1;if(a[j] ~ /^B 120:/) print a[j]}}
} times=1;i=0;for(j in a) delete a[j]} {a[++i]=$0} END{
if(times!=8) {
for(i=1;a;i++){
split(a,f)
if (a ~ /^B 0:/ && f[7]=="10" && f[8]=="00") {if(a[i-1] ~ /^Date:/){print a[i-1];print a}}
if (a ~ /^B 0:/ && f[7]=="10" && f[8]=="04") {if(a[i-1] ~ /^Date:/){print a[i-1];print a;j=i+1;if(a[j] ~ /^B 120:/) print a[j]}}
if (a ~ /^B 0:/ && f[7]=="10" && f[8]=="06") {if(a[i-1] ~ /^Date:/){print a[i-1];print a;j=i+1;if(a[j] ~ /^B 120:/) print a[j]}}}}}' inputfile

thibodc · June 30, 2012, 3:36pm

Could you explain the new code...thanks again for your help.

elixir_sinari · July 1, 2012, 3:18am

As you already mentioned, you need your logic to be applied on sets of data. So, first we need to identify a set. As the Date: lines will always be there, the only way (I could think) of identifying sets is a series of 7 Date: lines. So, the 8th occurrence of Date: line will signal the end of a set (and beginning of a new one) and so on. So, going on these lines, I am storing each of the lines in a set in an array and when a new set begins, I am applying your logic to the stored lines (of the previous set) and printing out the lines.

As I am using a new set to mark the end of the previous set, I need to repeat your logic (only for the last set) in the END section.