Help with Script using awk

Hi All,

I need to create a script to look at a very long list of files, and look inside these files for a specific number of a special character "," there needs to be no more than 14 in each line in each file.

I have copied 2 files for testing the script.

dk@server: /export/home/test> ls -alt DK* | awk '{ print $9 }'
DK_TEST2.dat
DK_TEST.dat
dk@server: /export/home/test>

these files look something like this (not exact content, but close - to protect data)

,,30035111,DAVY KELLY JOB NO. 17748000,16940000,04/09/2017,04/09/2017,P001111,10.2,10,1.02,Davy Kelly,1,,
,,30035111,DAVY KELLY JOB NO. 17748000,16940000,04/09/2017,04/09/2017,P001111,13.62,6,2.27,Davy Kelly,2,,
,,30035111,DAVY KELLY JOB NO. 17748000,16940000,04/09/2017,04/09/2017,P029111,16.91,1,16.91,Davy Kelly,3,,
,,30035111,DAVY KELLY JOB NO. 17748000,16940000,04/09/2017,04/09/2017,P029111,16.91,1,16.91,Davy Kelly,4,

I am looking to:
make a list of the filenames in the path that contain DK* save as LIST
for each FILE in LIST
Check via AWK for more than 14 commas - if true print FILENAME to OUTPUT file
mail out OUTPUT file to myself.

#!/bin/sh
###
###  Name:         check_excesiveComma.sh
###  Path:          /export/home/test
###  Description:  Copy a list of Filenames that have more than 14 commas
###  Version:      1.1
###  Author:       DKelly
###

FPATH=/export/home/test
DAT=`date`
MSG=/tmp/email.txt
OUTPUT=/tmp/output.txt
LIST=/tmp/list.txt

touch $LIST
ls -alt DK* | awk '{ print $9 }' >> $LIST

###  Change Directory to $FPATH
cd $FPATH

###  Check all files for more than 14 commas and output the filenams
for FILE in $LIST
do
    if (`awk 'BEGIN -F"," {NF > 14}`) then echo $FILE >> $OUTPUT
done

###  Mail out the message and delete the output and message
cat $MSG | /usr/bin/mailx -s"Graftons Files that might cause issues" davy

###  remove files.
rm $MSG
rm $LIST
rm $OUTPUT

###  end script

Don't worry I know I have a variable for MSG and this is really the one i want to email...

I am not really getting AWK very well... I need to look in the each file in the List, count each line for commas, and echo out the Actual Filename if one of the lines has more than 14 commas.

Please could you kind people point me in the right direction.

davy

First question: Do all the files contain multiple lines, or just one line per file? If the former, must all the lines comply with the number of commas?

Assuming the latter, this might be a little better:

#!/bin/sh
###
###  Name:         check_excesiveComma.sh
###  Path:          /export/home/test
###  Description:  Copy a list of Filenames that have more than 14 commas
###  Version:      1.1
###  Author:       DKelly
###

FPATH=/export/home/test
DAT=`date`
MSG=/tmp/email.txt
OUTPUT=/tmp/output.txt

###  Change Directory to $FPATH
cd $FPATH

LIST=`ls -1 DK* 2>/dev/null`

###  Check all files for more than 14 commas and output the filenames
for FILE in $LIST
do
    [ `awk -F, '{print NF;}' $FILE` -gt 14 ] && echo $FILE >> $OUTPUT
done

###  Mail out the message and delete the output and message
cat $MSG | /usr/bin/mailx -s"Graftons Files that might cause issues" davy

###  remove files.
rm $MSG
rm $LIST
rm $OUTPUT

###  end script

If the former, a bit more work will need to be put into it.

I have fixed a couple of obvious bugs in your code. Please, never use ls -l | awk '{ print $x;}' . It is unnecessary and dependent on the version of ls you are using.

Andrew

1 Like

Hi Andrew,

Thanks for getting back to me.

To answer your question:
in the directory there could potentially be thousands of files.
each file will have multiple lines
If any of the lines in the file contains greater than 14 commas - I want it to put the filename into another file for emailing out.

I will test your suggestion and get back to you..

Thanks again for your prompt reply & help.

davy

I don't really know awk, but you will probably want something like

awk -F, 'BEGIN {nrecs=0;} {if (NF > nrecs) { nrecs = NF; } } END {print nrecs; }' $FILE

That might not be syntactically correct, but give it a go.

Andrew

1 Like

Two comments in case you are still runnig Solaris 10:

  • never use #!/bin/sh under Solaris 10 and older, use ksh (or bash).

  • never use awk under Solaris 10 and older, use nawk.

1 Like

Unfortunately, my awk doesn't provide the nextfile command, so we need the additional uniq . Try

awk 'NF != 14 {print FILENAME}' FS=, DK* | uniq | /usr/bin/mailx -s"Graftons Files that might cause issues" davy
1 Like

If the dir could contain thousands of files, DK* might break someday. You can also get rid of that uniq.

ls | grep "^DK" | xargs awk -F, '(NF != 14) && !(FILENAME in A) { print FILENAME ; A[FILENAME] }' |
        /usr/bin/mailx -s"Graftons Files that might cause issues" davy

If your files are all really huge, this version might be faster:

ls | grep "^DK" | xargs -n1 awk -F, '(NF != 14) { print FILENAME ; exit }' |
        /usr/bin/mailx -s"Graftons Files that might cause issues" davy
1 Like

Hi All,

Amazing what a wee nights sleep can do....

I found my issue was trying awk in the if statement, so changed it and amended it to what I need...

. . .
OUTPUT=/tmp/output.txt
LIST=`ls -1 POM* 2>/dev/null`
. . .

for FILE in $LIST
do
    awk -F"," ' { if( NF > 14){ print FILENAME" line:> " NR }}' $FILE >> $OUTPUT
done

this work perfect for what I need it for...

Thanks for all your suggestions.

davy

P.s. Yes still on Solaris 10 but moving to 11 later in the year.

If you do not use $LIST elsewhere you can do

. . .
OUTPUT=/tmp/output.txt
. . .

for FILE in POM*
do
    [ -f "$FILE" ] || continue
    awk -F"," '{ if (NF > 14) { print FILENAME" line:> " NR } }' "$FILE"
done > $OUTPUT

The whole output stream is redirected to $OUTPUT, this is more efficient than many times append it.
The globbing in the for requires a test to ensure presence, but the test also ensures that it's a real file (not a directory) which makes it more robust.
Have $FILE in quotes, or the shell attempts to expand it.

1 Like