Shell script for validating fields in a file

asemota · January 28, 2010, 9:44am

Hi,

I have not used Unix in a very long time and I am very rusty. I would appreciate any help I can get from the more experienced and experts in Shell script.

I am reading one file at a time from a folder. The file is a flat file with no delimeters or carriage return. Col1 through col6 is the header section and should be read once. Col7 - 14 is the body and can have 1 or multiple records. I read the file check to make sure none of these fields are blank. If there is a blank field, I would like to write a message to a log file that can be viewed later. I haven't gotten to the log file yet. Just trying to read the file with an inner loop just doesn't seem to work. Please help me.

 
#!/bin/ksh
 
#This script was created on 01/27/2010 to check outbound
# files for missing required fields
###########################################################
 
 
for file in /folder/file.1$.5$
 
do
 
i=1
 
exec< file
 
while read line
do
 
col1=`echo $line | cut -c1-4`
col2= `echo $line | cut -c5-5`
col3= `echo $line | cut -c6-7`
col4= `echo $line | cut -c8-16`
col5= `echo $line | cut -c17-23`
col6= `echo $line | cut -c51-56`
 
if [ -z "$col1" ]
then
echo "Line No. $i -- No String in position 1-4 "
else
echo "Line No. $i -- String in position 1-4 : $col1"
fi
 
if [ -z "$col2" ]
then
echo "Line No. $i -- No String in position 5 "
else
echo "Line No. $i -- String in position 5 : col2"
fi
 
if [ -z "$col3" ]
then
echo "Line No. $i -- No String in position 6-7 "
else
echo "Line No. $i -- String in position 6-7 : $col3"
fi
 
if [ -z "$col4" ]
then
echo "Line No. $i -- No String in position 8-16 "
else
echo "Line No. $i -- String in position 8-16 : $col4"
fi
 
if [ -z "$col5" ]
then
echo "Line No. $i -- No String in position 17-23 "
else
echo "Line No. $i -- String in position 17-23: $col5"
fi
 
if [ -z "$col6" ]
then
echo "Line No. $i -- No String in position 51-56 "
else
echo "Line No. $i -- String in position 51-56 : $col6"
fi
 
i=`expr $i + 1`
 
exec< file
 
 While read line
    do
 
        col7=`echo $line | cut -c66-72`
        col8= `echo $line | cut -c73-74`
        col9= `echo $line | cut -c75-83`
        col10= `echo $line | cut -c84-85`
        col11= `echo $line | cut -c86-88`
        col12= `echo $line | cut -c89-96`
        col13= `echo $line | cut -c97-100`
        col14= `echo $line | cut -c108-115`
 
 
        if [ -z "$col7"] or [ -z "$col8"] or [ -z "$col9"] or [ -z "$col10"]
          or[ -z "$col11" ] or [ -z "$col12" ] or [ -z "$col13"] or [ -z "$col14"]
        then
        echo "Line No. $i -- Missing String "
        else
        echo "Line No. $i -- String is valid"
        fi
 
        i=`expr $i + 1`
        done
done
 
done

Thank you all for your help.

anbu23 · January 28, 2010, 11:04am

Remove space between = and `

col2= `echo $line | cut -c5-5`

If you dont have data for a column, then do you have blank spaces?

Can you show us input?

asemota · January 28, 2010, 11:22am

Thanks for responding. I appreciate it.

Yes if there are no data, it will show up as blanks

1111B0216262626111111 999999 22222227823876489050456201001222010 20101031

The date above will represent one record. So a second record will be

1111B0216262626111111 245367 33333338860870499050456201001182010 20100930

looking at the data again, I changed the code to . Col1 through col14 are in every record in the file.

 
for file in /folder/file.1$.5$
 
do
 
i=1
 
exec< file
 
while read line
do
 
col1=`echo $line | cut -c1-4`
col2= `echo $line | cut -c5-5`
col3= `echo $line | cut -c6-7`
col4= `echo $line | cut -c8-16`
col5= `echo $line | cut -c17-23`
col6= `echo $line | cut -c51-56`
 col7=`echo $line | cut -c66-72`
 col8= `echo $line | cut -c73-74`
 col9= `echo $line | cut -c75-83`
 col10= `echo $line | cut -c84-85`
 col11= `echo $line | cut -c86-88`
 col12= `echo $line | cut -c89-96`
 col13= `echo $line | cut -c97-100`
 col14= `echo $line | cut -c108-115`
 
if [ -z "$col1" ]
then
echo "Line No. $i -- No String in position 1-4 "
else
echo "Line No. $i -- String in position 1-4 : $col1"
fi
 
if [ -z "$col2" ]
then
echo "Line No. $i -- No String in position 5 "
else
echo "Line No. $i -- String in position 5 : col2"
fi
 
if [ -z "$col3" ]
then
echo "Line No. $i -- No String in position 6-7 "
else
echo "Line No. $i -- String in position 6-7 : $col3"
fi
 
if [ -z "$col4" ]
then
echo "Line No. $i -- No String in position 8-16 "
else
echo "Line No. $i -- String in position 8-16 : $col4"
fi
 
if [ -z "$col5" ]
then
echo "Line No. $i -- No String in position 17-23 "
else
echo "Line No. $i -- String in position 17-23: $col5"
fi
 
if [ -z "$col6" ]
then
echo "Line No. $i -- No String in position 51-56 "
else
echo "Line No. $i -- String in position 51-56 : $col6"
fi

 if [ -z "$col7"] or [ -z "$col8"] or [ -z "$col9"] or [ -z "$col10"]
 or[ -z "$col11" ] or [ -z "$col12" ] or [ -z "$col13"] or [ -z "$col14"]
 then
 echo "Line No. $i -- Missing String "
 else
 echo "Line No. $i -- String is valid"
 fi

 
i=`expr $i + 1`


done
 
done

Also instead of using all these if statements, can I use a case? if yes how do I do it?

Thanks in advance for your help.

jim_mcnamara · January 28, 2010, 12:10pm

If you run that code over dozens of big files it will take forever. Every one of those backtick lines creates a separate process.

awk (I used nawk, same thing ) was meant for stuff like this.
create a file: pos.txt that has the offsets

for infile in /folder/file.1$.5$   
do

  nawk -v infile=$infile  ' {
      if(FILENAME=="pos.txt" )
      { 
           pos[FNR]=$1; len[FNR]=($2-$1)+1; vals++; next 
      }
      if(FILENAME==infile )
      {         
           for(i=1; i<=vals; i++)
           {
             testval=substr($0,pos, len)
             gsub(/ /, "", testval)
             if(length(testval)==0) {print infile, "line:", FNR, " blank field:", i}
           }  
           next         
      } 
  } '  pos.txt  $infile

done > report.txt

asemota · January 29, 2010, 2:27pm

Thanks Jim.

Now I am having problems opening the file

/folder/file.1$.5$

I have multiple files in this folder with the following names

file.10.5232004
file.12.2112003
.........

What do I do?

Thanks for your help again.

---------- Post updated at 03:40 PM ---------- Previous update was at 02:10 PM ----------

Thanks I got the code to work.

The problem I am having now is that it is repeating the same thing over and over again in the report.txt file.

If I wanted to print out only the file name, number of records and the length of each record for each file, how do I do this? with the code below

 
{print infile, "line:", FNR, " blank field:", i}

Thank you

---------- Post updated 01-29-10 at 12:47 AM ---------- Previous update was 01-28-10 at 03:40 PM ----------

Thank you!! I was able to find a solution in the forumn

---------- Post updated at 03:26 PM ---------- Previous update was at 12:47 AM ----------

I am having a hard time extracting the file name from the above code. In stead of printing /folder/file.1$.5$, I would like it to print the file name file.1$.5$.

I have tried using basename but it looks like NAWK or AWK does not recognise basename. Each time I type it in, it prints out the word basename

{print basename infile, "line:", FNR, " blank field:", i}

Thank you for your help!!!

---------- Post updated at 03:27 PM ---------- Previous update was at 03:26 PM ----------

I am having a hard time extracting the file name from the above code. In stead of printing /folder/file.1$.5$, I would like it to print the file name file.1$.5$.

I have tried using basename but it looks like NAWK or AWK does not recognise basename. Each time I type it in, it prints out the word basename

{print basename infile, "line:", FNR, " blank field:", i}

Thank you for your help!!!

fastlane3000 · January 30, 2010, 6:02am

hello everybody

i have almost the same problem with my files (i am on AIX 5.3), i am gonna try this solution
The main difference that in my case i don't have header but also i can't use "pos.txt"
because i have offest like that

so is there another way to fit for my purpose

thx jim for your great work and thx asemota for asking so i could fond a solution