missing comma delimeter in columns

hi

if comma delimeter missing in columns treat them as bad file and if it is not then gudfiles. only checking columns not data.

id,name,sal,deptno =======> gudfile
1,awa,200,10
2,aba,100,20
3,cdc,300,30

idname,sal,deptno ========> badfile since its missing (.)
1,awa,200,10
2,aba,100,20
3,cdc,300,30

thanks.

Do you know in advance how many fields to expect or do you need to read the first line of the data to get that detail?
if not

if [ "X$(head -1 file.dat | awk -F\, '{print $4}')" eq "X" ] ; then
   echo "Bad file"
else 
   echo "Good file"
fi

This will check each record in inputfile and finally determine if good or bad based on whether number of columns are equal to 4 or not.

awk -F',' '{(NF==4)?(x=0):(x=1)} END{if(x==0){print "good"}else{print "bad"}}' inputfile

thanks a lot for all your replies. i dont know how many columns will be in each file. just for example i mention 4 columns. i got the logic i will try using the above logic.

here is the code which i was trying for the data. now i have to check only for columns. how can i modified the same script to just for columns. here its checking each line and i have to check the line which contain columns of the file.

#!/bin/ksh
BASE_DIR=/data/SrcFilescd 
$BASE_DIR
## finding the files from work directory which are changed in 3 day
Find . -type f -name "*.csv" -ctime 3 > /home/mydir/flist.txt
## Loop thru all the file nameswhile read linedo
## get only the base name for the file
FN=`basename $line`
## the variable DC counts the number delimiters on each line and sort them and get the unique
## for good file without any delimiter missing the count should be one
DC=`awk -F "," '{print NF}' $FN | sort | uniq -c | wc -l `
## From the above we know that the good file always have DC equal one..
if [ $DC -ne 1 ]; then
echo $DC
echo $FN >> /home/mydir/badfile.txt
## also remove the bad files that are been corrupted here by removing comments ## rm $FN
else
echo $DC
echo $FN >> /home/mydir/gfile.txt
fi done < /home/mydir/flist.txt

Please use code tags. Codes/data samples would be easier to read.

This should work for any number of columns:

awk -F, 'NR==1{n=NF} n!=NF{b=1;exit} END{print b?"Bad":"Good"}' infile

Version with return code:

awk -F, 'NR==1{n=NF} n!=NF{b=1;exit} END{exit b}' infile

---------- Post updated at 20:31 ---------- Previous update was at 18:40 ----------

Multiple files:

awk -F, 'function pr(){print f ": " (b?"Bad":"Good")} FNR==1{if(NR>1)pr();n=NF;f=FILENAME;b=0} !b && n!=NF{b=1} END{pr()}' infile*

yeh its working

thanks scrutinizer.. appericiated. see the we have to put all the bad files in one of the text file in the same dir. and same with good files. some thing like this

X = awk -F, 'function pr(){print f ": " (b?"Bad":"Good")} FNR==1{if(NR>1)pr();n=NF;f=FILENAME;b=0} !b && n!=NF{b=1} END{pr()}' infile*
 
if [ $X -ne 1 ]; then

echo $X

echo $FN >> /home/mydir/badfile.txt

## also remove the bad files that are been corrupted here by removing comments ## rm $FN

else

echo $X

echo $FN >> /home/mydir/gfile.txt

thanks again for all your replies. i learned a lot.

Hi, try a slight modification:

awk -F, 'function pr(){sf=(b?"Bad":"Good") "file.txt";print f>sf} FNR==1{if(NR>1)pr();n=NF;f=FILENAME;b=0} !b &&n!=NF{b=1} END{pr()}' *.csv

---------- Post updated at 21:02 ---------- Previous update was at 20:54 ----------

If you want to use script:

csvok() {
  awk -F, 'NR==1{n=NF} n!=NF{b=1;exit} END{exit b}' "$1"
}

for file in *.csv
do
  if csvok "$file"; then
    echo "$file" >> /home/mydir/goodfile.txt
  else
    echo "$file" >> /home/mydir/badfile.txt
  fi
done
1 Like