Hello all,
I have several directories with a sequence of files like this
IM-0001-0001.dcm
IM-0001-0002.dcm
IM-0001-0003.dcm
IM-0001-0004.dcm
IM-0001-0005.dcm
I would like to print out the name of the file that is missing.
I currently have the following ineffecient way to do this and wondering if you would suggest me a better way to do this in multiple directories.
ls -1 *.dcm | awk -F"-" '{print $3}' > ori.txt
[]$ cat ori.txt
0001.dcm
0002.dcm
0004.dcm
0005.dcm
Create another list with all files that are supposed to be there
[]$ cat main.txt
0001.dcm
0002.dcm
0003.dcm
0004.dcm
0005.dcm
[]$ diff ori.txt main.txt
2a3
> 0003.dcm
It would be good if I could display the full name of the missing file.
Thanks,
The trouble with detecting holes in sequences is, how do you detect a hole at the beginning, or the end? Unless you really do know what files are supposed to be there, you're going to be reduced to guessing in some situations no matter what.
Will there ever be more than one sequence in this folder, or just the one?
This can detect some kinds of sequences. It assumes anything with digits and an extension is part of a sequence, and tells different sequences apart from the string before the last set of digits and the extension. It doesn't need the files in sorted order.
$ cat missing.awk
X=match($0, /[0-9]+\.[^.]*$/) {
Y=match($0, /\.[^.]*$/);
PFIX=substr($0, 0, X-1); # IM-0001-
EXT=substr($0, Y); # .dcm
VAL=substr($0, X, Y-X); # 0003
# To check if the number of digits is changing.
DIGITS[PFIX,EXT,length(VAL)]++;
# The +0 is to guarantee a numeric sort, not alphabetic, so "01" < "2".
if((!SMIN[PFIX,EXT]) || (SMIN[PFIX,EXT]>(VAL+0))) SMIN[PFIX,EXT]=VAL+0;
if((!SMAX[PFIX,EXT]) || (SMAX[PFIX,EXT]<(VAL+0))) SMAX[PFIX,EXT]=VAL+0;
F[PFIX,EXT,VAL]=1;
}
END {
for(X in SMAX)
{
split(X, A, SUBSEP);
PFIX=A[1]; EXT=A[2];
DC=0;
DMAX=0;
for(Z in DIGITS)
{
split(Z, A, SUBSEP);
if((A[1] != PFIX) || (A[2] != EXT)) continue;
if(A[3] > DMAX) DMAX=A[3];
DC++;
}
if(DC == 1) CMDSTR="%0" DMAX "d"
else CMDSTR="%d"
for(N=SMIN[X]+0; N<=(SMAX[X]+0); N++)
{
VAL=sprintf(CMDSTR, N);
if(!F[PFIX,EXT,VAL])
print "Missing", PFIX VAL EXT;
}
}
}
$ touch IM-0001-{0001..0005}.dcm file-{8..15}.dat
$ rm IM-0001-0003.dcm file-9.dat file-11.dat
$ ls | awk -f missing.awk
Missing file-9.dat
Missing file-11.dat
Missing IM-0001-0003.dcm
$
1 Like
Alternatively try this less general approach:
printf "%s\n" *.dcm | awk -F'[-.]' '$3>p+1{for(i=p+1;i<$3;i++){s=$0; sub($3"."$4,sprintf("%04d",i)"."$4,s); print s}}{p=$3}'
This assumes that all files have a fixed length, zero-padded counter in the third field, that they have an extension in the fourth field and that all fields (and field separators) other than the third field are identical. This also ensures wildcard expansion is in the right order..
1 Like
Thanks a lot for your help guys.
Scrutinizer: It works great. I will used the code tages from next time on..