validate each field in txt

Hello,

I have a file with a lot of record like below:

00001,CUSTR,CUSTOMER ADDRESS,02310,N,0:00,0,0,0,0,0,0,0,0,0,0,0,0:00,0,0,0,0,0,CSH,ACC

Can I validate each record in the file and output the incorrect result?

field 1 - customer number, should be "5 digit.
field 2 - should be 5 string, no number
field 3 - customer address - any string, number or space
feild 4 - inovice number - should be 5 digit.
field 5 - should be "N' or "Y"
field 6 - process time, should be "mm:ss"
field 7 to 17 - digit
field 18 - end process time, should be "mm:ss"
field 19 to 23 - digit
field 24 - 3 string (A to Z)- no blank, no "?"
field 25 - 3 string (A to Z) - no blank

Here is error sample:
00001,CUSTR,CUSTOMER ADDRESS,023a0,N,1,0,0,S,0,0,0,0,0,0,0,0,12:15,0,0,0,0,0,?,

the error report will look like:
record1:field 4,field 6,field9,field24,filed25

thx!!

hello,

I have a record below and would like to change the time fromat from m:ss to mm:ss

input
00001,CUSTR,CUSTOMER,02310,N,0:00,0,0,0,0,0,0,0,0,0,0,0,0:00,0,0,0,0,0,CSH,
00001,CUSTR,CUSTOMER,02310,N,5:12,0,0,0,0,0,0,0,0,0,0,0,1:10,0,0,0,0,0,CSH,

output
00001,CUSTR,CUSTOMER,02310,N,00:00,0,0,0,0,0,0,0,0,0,0,0,00:00,0,0,0,0,0,CSH,
00001,CUSTR,CUSTOMER,02310,N,05:12,0,0,0,0,0,0,0,0,0,0,0,01:10,0,0,0,0,0,CSH,

Use perl for this. I think it would be easiest. I don't have access to a test system, so can't give you an example, but basically, read from STDIN, split the line into an array with the ',' as the delimiter. Then use regexes to match your patterns.

Try:
sed 's/,\([0-9]:[0-9][0-9]\)/,0\1/g' data

Assuming "data" is a file with the input.

Both threads are referring to the same problem, so I have merged them. You can follow the same directions, just that when you get to the field with the time, if it is x:xx, just add a leading '0'.

can you provide an example for me?

thx!

anyone can help for this?

Hope you have fixed you field 6 and 18 to be like MM:SS .
For the rest of the validation you can use the below script.

Might be a round about way, but this is what i have developed.

NOTE: I have to strip of spaces in the input file to print the entire line :frowning:

#!/bin/ksh

### Functions ###

is_digit()
{
        strLen=${#1}
        check_no=$(echo $1 | tr -d '[0-9]')
        if [[ $strLen -eq 5 && $check_no = "" ]]
        then
                return $SUCCESS
        else
                return $FAIL
        fi
}
is_string()
{
        strLen=${#1}
        check_no=$(echo $1 | tr -d '[A-Za-z]')
        if [[ $strLen -eq 5 && $check_no = "" ]]
        then
                return $SUCCESS
        else
                return $FAIL
        fi
}
validate()
{
        vRecord=$1
        vField=$2
        if [[ $vField -eq 1 || $vField -eq 4 ]]  # FOR FIELDS 1 AND 4
        then
                is_digit $vRecord
                if [[ $? -eq $SUCCESS ]]
                then
                        return $SUCCESS
                else
                        return $FAIL
                fi
        fi
        if [[ $vField -eq 2 ]]                  # FOR FIELDS 2
        then
                is_string $vRecord
                if [[ $? -eq $SUCCESS ]]
                then
                        return $SUCCESS
                else
                        return $FAIL
                fi
        fi
        if [[ $vField -eq 5 ]]                 # FOR FIELDS 5
        then
                if [[ $vRecord = "N" || $vRecord = "Y" ]]
                then
                        return $SUCCESS
                else
                        return $FAIL
                fi
        fi
        if [[ $vField -eq 24 || $vField -eq 25 ]]  # FOR FIELDS 24 AND 25
        then
                tmpStr=$(echo $vRecord | sed 's/[A-Z]\{3\}//')
                if [[ $tmpStr = "" ]]
                then
                        return $SUCCESS
                else
                        return $FAIL
                fi
        fi
        if [[ $vField -ge 7 && $vField -le 17 ]] || [[ $vField -ge 19 && $vField -le 23 ]] # FOR 7-17 AND 19-23
        then
                tmpStr=$(echo $vRecord | sed 's/[0-9]//')
                if [[ $tmpStr = "" ]]
                then
                        return $SUCCESS
                else
                        return $FAIL
                fi
        fi


}

### Main ###

export FAIL=1
export SUCCESS=0
InpFile=$1
NO_OF_COLUMNS=25
for line in `cat $InpFile | tr -d ' '`
do
        PRINT_FLAG=0
        field_no=1
        while [[ $field_no -le $NO_OF_COLUMNS ]]
        do
                fRecord=$(echo $line | cut -d',' -f$field_no)
                validate $fRecord $field_no
                if [[ $? -eq $FAIL ]]
                then
                        PRINT_FLAG=1
                        break;
                fi
                field_no=$((field_no+1));
        done
        if [[ $PRINT_FLAG -eq $SUCCESS ]]
        then
                echo $line
        fi
done

thank you very much for your help.

I have some idea for this.....:):slight_smile:

can i ask you one more question.
if the filed 6 should incorrect time format like "hh:mm:ss:00", can I write a simple script to delete the last two digit and showing correct one like "hh:mm:ss"

Thx again!!

Yes you can

echo 12:30:00:00 | sed 's/\([0-9]\{1,2\}:[0-9]\{1,2\}:[0-9]\{1,2\}\).*/\1/'
12:30:00

or

Time=12:30:00:00
New_time=${Time%:*}
echo $New_time