Find the missing sequence

Dear all

i am having file with max 24 entries. i want to find which sequence is missing

file is like this

df00231587.dat
df01231587.dat
df03231587.dat
df05231587.dat
.
.
.
df23231587.dat

the changing seq is 00-23,so i would like to find out which seq is missing like in above file seq 02 and 04 is missing

so the output shoould be like this

02231587
04231587

is missing and leading zero is important.

thanks in advance

Hello sagar_1986,
I have a few to questions pose in response first:-

  • What have you tried so far?
  • What output/errors do you get?
  • What OS and version are you using?
  • What are your preferred tools? (C, shell, perl, awk, etc.)
  • What logical process have you considered? (to help steer us to follow what you are trying to achieve)

Most importantly, What have you tried so far?

There are probably many ways to achieve most tasks, especially if there are only a few lines (and therefore IO is not so much of a worry) Giving us an idea of your style and thoughts will help us guide you to an answer most suitable to you so you can adjust it to suit your needs in future.

We're all here to learn and getting the relevant information will help us all.

Regards,
Robin

Hello rbatte1,

its a shell script,actually i am giving missing file status for approx 350 types.
But till now i was giving just count and missing count for month,but now i want to give exact sequence for missing files.
Now i have tried to put all sequence in one file and comparing them output is driven but my requirement is without comparing (i.e. not in file) can we do it ??
i can get sequence i.e. 00-23 using cut command but my exact problem is how to get missing sequence and put them into desired format as stated above

Also there is one more problem i would like to discuss that,i am learning Unix and till date my biggest problem is transposing rows into column,i have already posted that thread but till i didn't understand how to figure it out.

My output is like this

i want to transpose rows into column.Plz explain me how to figure it out.
Note :number of rows may vary Hence the transpose script first read how many rows and then convert it into columns

Thanks in advance

awk -F '.' 'NR == 1{t=substr($1,5); a=00}
  {b=substr($1, 3, 2); for(i=a+1; i<b+0; i++)
    printf "%02d%d\n", i, t; a=b}
  END {for(i=a+1; i<=23; i++) printf "%02d%d\n", i, t}' file

Dear SriniShoo,

Thanks for the reply,it works fine for me,but will u plz elaborate the step by step functionality of code so that i can rewrite the code again as per my requirement

In the code

  1. I am extracting the 3rd and 4th character
  2. Since the list is incremental, I am checking current number is +1 of the previous number
  3. If not, I am printing all the numbers between previous and current number

Other way to accomplish the task...

Input file:

cat list.txt
df00231587.dat
df01231587.dat
df03231587.dat
df05231587.dat
df0631587.dat 
df0731587.dat 
df08231587.dat 
df09231587.dat 
df11231587.dat 
df10231587.dat 
df19231587.dat 
df18231587.dat 
df17231587.dat 
df16231587.dat 

if u run the below script..

awk -F"." '{
b=substr($1,5);
a[NR]=substr($1,3,2);
n=NR;
}
END {
for(l=0;l<=23;l++)
{
s=sprintf("%02d",l);
for(i=1;i<=n;i++)
{
if(s != a)
{
c=c+1;
}
}
if(c==n)
{
print s b;
}
c=0;
}
}' list.txt

the output is..

02231587
04231587
12231587
13231587
14231587
15231587
20231587
21231587
22231587
23231587

---------- Post updated at 12:26 PM ---------- Previous update was at 12:25 PM ----------

the above are missing files from list.txt

Hi sagar_1986,
Note that we are ignoring the second issue you raised in message #3 in this thread. We don't need to discuss one issue in two threads; it makes it much too hard for the volunteers trying to help you if you discuss the same issue in two places.

Note that SriniShoo's script will not print the "00" line if it isn't present in the input file. Assuming that the input file is sorted by hour, this seems to do what SriniShoo was trying to do:

awk -F'[.]' '
NR == 1 {
	t = substr($1,5)
	a = 0
}
{	b = substr($1, 3, 2)
	for(i = a; i < b + 0; i++)
		printf("%02d%d\n", i, t)
	a = b + 1
}
END {	for(i = a; i <= 23; i++)
		printf("%02d%d\n", i, t)
}' file

If I were going to try this using bharat1211's approach (which would be required if the input is not sorted), I would simplify it to something more like:

awk -F'[.]' '
FNR == 1 {
	t = substr($1, 5)
}
{	h[substr($1, 3, 2)]
}
END {	for(i = 0; i < 24; i++)
		if(!((hour = sprintf("%02d", i)) in h))
			print hour t
}' file

If file contains:

df01231587.dat
df03231587.dat
df04231587.dat
df06231587.dat 
df07231587.dat 
df08231587.dat 
df09231587.dat 
df10231587.dat 
df11231587.dat 
df12231587.dat 
df13231587.dat 
df14231587.dat 
df16231587.dat
df17231587.dat 
df18231587.dat 
df19231587.dat 
df22231587.dat

both of these scripts produce the output:

00231587
02231587
05231587
15231587
20231587
21231587
23231587

For any readers who want to try this on a Solaris/SunOS system, change awk to /usr/xpg4/bin/awk , /usr/xpg6/bin/awk , or nawk .

Dear Don Cragun,

Thank you very much for your suggestion,but i would more likely to ask that how to get this output in same line and how to redirect output of print in file.
I want output like this

which is redirected in file
i have tried this one but its not working

m_ch=` awk -F'[.]' '
  FNR == 1 {
  t = substr($1, 5)
  }
  {       h[substr($1, 3, 2)]
  }
  END {   for(i = 0; i < 24; i++)
  if(!((hour = sprintf("%02d",i "\c")) in h))
  print hour t
  }' file`

echo "$m_ch \c" >> files.txt

Changing my awk script to:

awk -F'[.]' '
FNR == 1 {
	t = substr($1, 5)
}
{	h[substr($1, 3, 2)]
}
END {	for(i = 0; i < 24; i++)
		if(!((hour = sprintf("%02d", i)) in h))
			out = (out != "" ? out "," : "") hour t
	print out
}' file >> files.txt

adds the line:

00231587,02231587,05231587,15231587,20231587,21231587,23231587

to the end of the file named files.txt .

1 Like

Dear Don Cragun,

the script prints the output properly but the directed output (file) is not ok,the output of print is

but in file the output is

i also want to share one thing that, i am running this query on number of Terminal equipment and the output captured is not appropriate the output is like this (attached herewith) as it should display only missing sequence

Plz help me out

Your file is in DOS format. Convert to UNIX format first:

tr -d '\r' < file > newfile

Thanks Scrutinizer,
i got it......plz let me know this script will not work if not file is present ??...i.e. by any how my terminal equipment is not working then i will not get available files so i should get all fils as missing but as per my observation this script is not working if no data is available.

You can test if the file is present:

if [ -f /path/to/file ]; then
  process file
fi
1 Like