Trying to count total files with different file types with thousands of files in each folder.
Since some files do not have extensions I have to use below criteria.
Count Total Files starting with --> "^ERROR"
Count Total Files starting with --> "^Runtime"
Count Everything else or files without any extension
sample input files in each sub-folder.
RuntimeProperties_296090758.xls
RuntimeProperties_296409844.xls
ERROR_261218287_296336046_20161213_101129
261218194_296090758_20161212_120448
RuntimeProperties_296413261.xls
ERROR_261218194_296090758_20161212_120448
261218287_296409844_20161213_120039
261218287_296336046_20161213_101129
ERROR_261218287_296409844_20161213_120040
Since I have to count this in a 12TB root folder with 6800 sub folders with thousands of files in each, this should not get into buffer overflow or out of memory or too many files situations. It should be faster.
I think either perl or awk can do this implicitly with the help of xargs!,, but not entirely sure how..
# I wish something like this can print counts for each sub-folder.
for each targetDIR in $(6800 folders); do
find $targetDIR -type f | xargs -i awk -v file="{}" -v td="$targetDIR" \
'file ~ "./^ERROR" {CNT_ERROR += 1}; \
file ~ "./^Runtime" {CNT_Runtime += 1}; \
file !~ "./^ERROR|^./Runtime" {CNT_Others += 1}; \
END {print td "," CNT_ERROR "," CNT_Runtime "," CNT_Others}'
done
Then I can get overall counts myself.