I have a file with two fields. The first field repeats itself for quite a while but the second field changes. What I want to do is to go through the first column until its value changes (and while it doesn't, verify that the second field is in a sequence from 0-15).
awk '
ch!=$1{ch=$1;seq=$2} #initialize with new channel
ch==$1{ #channel same as stored
if((seq++%16)!=$2) #increment and cycle the counter; compare
cntr[$1]++
}END{
for(i in cntr) {
print "Channel " i " has " cntr " discontinuities"
}
}' input
Note that the output is gonna be in random order.
To sort them by channel pipe this awk code to sort
awk '{...}' input | sort -n -k2
To sort by number of discontinuities, sort by fourth field:
It doesn't seem to work, it doesn't print anything... are the first couple of instructions supposed to be wrapped in a BEGIN statement?
---------- Post updated at 11:42 AM ---------- Previous update was at 11:34 AM ----------
Thank you!!! It works .....but there's a tiny problem. I wanted to specify that two consecutive fields having the same value shouldn't be seen as a discontinuity.
Lines 5, 6, 7 shouldn't be seen as a discontinuity since I have a lot of those in the input file
---------- Post updated at 11:54 AM ---------- Previous update was at 11:42 AM ----------
Actually, I think it's only counting the number of times a channel is present in the first field... because I have another script that does that and returns the stats, and they're both giving the same results now...
Channel 160 has 13 discontinuities
Channel 162 has 4 discontinuities
Normally there should be 2 discontinuities in channel 160 and 2 in channel 162. Is there an if statement missing? where we check if the channel is the same as stored?
I re-checked the other code (the one provided by UVI ) again (with this smaller input file) and it doesn't work properly, so I still have the same problem
awk '
ch!=$1{ch=$1;seq=$2} #initialize with new channel
ch==$1{ #channel same as stored
if(seq==$2) next; #if same, skip to next line
else if((++seq%16)!=$2) { #increment and cycle the counter; compare
cntr[$1]++
seq=$2 #reset seq
#print "Disc. " $0 #debug; uncomment to check what was grabbed
}
}END{
for(i in cntr) {
print "Channel " i " has " cntr " discontinuities"
}
}' input
Thanks for the reply. It works well overall but the problem is that it seems to detect every second repeated number as a discontinuity as well, so say that this is the input (if the sequence was from 0-3 instead of 0-15):
400 0
400 1 xx
400 1 xx
400 2
400 3
400 0 //
400 0 //
400 1
400 2
400 3 xx
400 3 xx
400 0
400 1 //
400 1 //
The places marked with // are viewed as a discontinuity. I think there's a problem with the storage of "seq" right after detecting a repeated number, but i haven't been able to fixt it.
I've attached the input file I'm using to test it, and the result of the discontinuities is:
$ cat test.sh
#!/bin/sh
awk '
ch!=$1{ch=$1;seq=$2} #initialize with new channel
ch==$1{ #channel same as stored
if(seq==$2) next; #if same, skip to next line
else if((++seq%16)!=$2) { #increment and cycle the counter; compare
cntr[$1]++
seq=$2 #reset seq
print "Disc. " $0 " seq: " seq #debug; uncomment to check what was grabbed
}
}END{
for(i in cntr) {
print "Channel " i " has " cntr " discontinuities"
}
}' < $1