I have show below a file separated by commas. In first column has numbers where the last number is 13.
1,4
2,6
3,7
5,2
6,5
7,5
8,65
9,10
11,78
13,2
What I want to know is which numbers are missing from 1 to 13 (in this case 13 is last number in column 1). My real file
has more than 5 million lines.
It works with sample file but is not working with real file.
Real file has 5,440,177 lines and the last number in column 1 is 5440255. So, substracting both there are 78 numbers that are missing in column 1.
But trying your script I get more than 33 million of lines and I stopped since it seems enters in an infinite loop.
Is there a way to preload an array from 1 to N (N=13 in this case, in real file N=5440255)? in order to compare array which values from column 1 are not in array?
I've tried your script, the expected number of missing values in real file are 78 but is printing 108 numbers.
Is there a way to compare column 1 with a preloaded array that contains elements from 1 to N? in order to print values of array that are not present in column1?
Yes, there are those wrong values, then due to that I'd like to pre load an array of N consecutive elements to compare with column 1 to print only those that are missing, but I don't know how to pre load an array in that way.
Thanks for the help. It seems to work fine. Is giving me 93 values and the counts it seems to be fine since:
NL = Number of lines = 5440172
LN = Last number in column 1 = 5440255
Wl = Wrong lines = 10
Then, (LN-NL)+Wl=83 + 10 = 93.
PS: May you explain me the logic of your program please.
Since the first example was coma delimeted then -F, (field delimiter) was used but not needed for the file posted.
length($1)<8 use only records that have field 1 length < 8
for (i=a+1; i<$1; i++) print i; a=$1 for value of a + 1 (stored from last record) to value of first field print the value of i list; store field 1 in a variable
Try this since you have not supplied real input and not even mentioned that $1 length should not exceed more than 7. You were getting wrong result, it does not mean that it enters in an infinite loop. And in #1 you shown that your input is comma separated, but in real input it's not.
Missing and Count is shown below, change print x,++n to print x once test is done
Thanks for your help. I provided a simple sample since the logic should work for a small sample and in general. The handling of length of 7 for column1 was introduce by rdrtx1 since he found 10 wrong records that I didn't know about their existence.
The real file is comma delimited, I only upload the first column since is to big with more columns and the script would be the same only needed to remove the field separator.
Your last code it seems to work fine with real file now and is great the addition of count.