missing comma delimeter in columns

awais290 · February 27, 2012, 10:04am

hi

if comma delimeter missing in columns treat them as bad file and if it is not then gudfiles. only checking columns not data.

id,name,sal,deptno =======> gudfile
1,awa,200,10
2,aba,100,20
3,cdc,300,30

idname,sal,deptno ========> badfile since its missing (.)
1,awa,200,10
2,aba,100,20
3,cdc,300,30

thanks.

Skrynesaver · February 27, 2012, 10:11am

Do you know in advance how many fields to expect or do you need to read the first line of the data to get that detail?
if not

if [ "X$(head -1 file.dat | awk -F\, '{print $4}')" eq "X" ] ; then
   echo "Bad file"
else 
   echo "Good file"
fi

balajesuri · February 27, 2012, 10:32am

This will check each record in inputfile and finally determine if good or bad based on whether number of columns are equal to 4 or not.

awk -F',' '{(NF==4)?(x=0):(x=1)} END{if(x==0){print "good"}else{print "bad"}}' inputfile

awais290 · February 27, 2012, 10:35am

thanks a lot for all your replies. i dont know how many columns will be in each file. just for example i mention 4 columns. i got the logic i will try using the above logic.

here is the code which i was trying for the data. now i have to check only for columns. how can i modified the same script to just for columns. here its checking each line and i have to check the line which contain columns of the file.

#!/bin/ksh
BASE_DIR=/data/SrcFilescd 
$BASE_DIR
## finding the files from work directory which are changed in 3 day
Find . -type f -name "*.csv" -ctime 3 > /home/mydir/flist.txt
## Loop thru all the file nameswhile read linedo
## get only the base name for the file
FN=`basename $line`
## the variable DC counts the number delimiters on each line and sort them and get the unique
## for good file without any delimiter missing the count should be one
DC=`awk -F "," '{print NF}' $FN | sort | uniq -c | wc -l `
## From the above we know that the good file always have DC equal one..
if [ $DC -ne 1 ]; then
echo $DC
echo $FN >> /home/mydir/badfile.txt
## also remove the bad files that are been corrupted here by removing comments ## rm $FN
else
echo $DC
echo $FN >> /home/mydir/gfile.txt
fi done < /home/mydir/flist.txt

balajesuri · February 27, 2012, 10:58am

Please use code tags. Codes/data samples would be easier to read.

Scrutinizer · February 27, 2012, 2:31pm

This should work for any number of columns:

awk -F, 'NR==1{n=NF} n!=NF{b=1;exit} END{print b?"Bad":"Good"}' infile

Version with return code:

awk -F, 'NR==1{n=NF} n!=NF{b=1;exit} END{exit b}' infile

---------- Post updated at 20:31 ---------- Previous update was at 18:40 ----------

Multiple files:

awk -F, 'function pr(){print f ": " (b?"Bad":"Good")} FNR==1{if(NR>1)pr();n=NF;f=FILENAME;b=0} !b && n!=NF{b=1} END{pr()}' infile*

awais290 · February 27, 2012, 2:22pm

yeh its working

awais290 · February 27, 2012, 2:43pm

thanks scrutinizer.. appericiated. see the we have to put all the bad files in one of the text file in the same dir. and same with good files. some thing like this

X = awk -F, 'function pr(){print f ": " (b?"Bad":"Good")} FNR==1{if(NR>1)pr();n=NF;f=FILENAME;b=0} !b && n!=NF{b=1} END{pr()}' infile*
 
if [ $X -ne 1 ]; then

echo $X

echo $FN >> /home/mydir/badfile.txt

## also remove the bad files that are been corrupted here by removing comments ## rm $FN

else

echo $X

echo $FN >> /home/mydir/gfile.txt

thanks again for all your replies. i learned a lot.

Scrutinizer · February 27, 2012, 3:02pm

Hi, try a slight modification:

awk -F, 'function pr(){sf=(b?"Bad":"Good") "file.txt";print f>sf} FNR==1{if(NR>1)pr();n=NF;f=FILENAME;b=0} !b &&n!=NF{b=1} END{pr()}' *.csv

---------- Post updated at 21:02 ---------- Previous update was at 20:54 ----------

If you want to use script:

csvok() {
  awk -F, 'NR==1{n=NF} n!=NF{b=1;exit} END{exit b}' "$1"
}

for file in *.csv
do
  if csvok "$file"; then
    echo "$file" >> /home/mydir/goodfile.txt
  else
    echo "$file" >> /home/mydir/badfile.txt
  fi
done