How to check a column contain numeric or char data type ??

I have a file called clientname_filename.csv
whose contents are like
col1|col2|col3|col4|
510|abc|xxx|450|
510|abc11|yyy|350
510|pqr99|zzz| 670
512|222|439|110

Here i have check the contents of column for data type.
i have a constraints that col1 always contain Numeric value column 2 can contain char type or alphanumeric ,column 3 should always char type,column 4 always numeric value

here i have a constarinfile called clientname_filename_constraint.csv

contents of this file is nothing but the datatype each column contain
col1|col2|col3|col4
Nu|char|char|Nu

Based on this file i want to check the main file and verify the data type of the column if any column violate the constraints it should show error.

here col 1 is right as it contain only numeric,col2 is invalid as it contain a numeric at the end, column 3 is also invalid for same reason. Column is right as it contain numeric only .

How will i check whether a column contain only numeric value or char value ??
please assist !!

try a shell script like this - you need to order the isnumeric and isalpha tests
per your requirements, I have them as col1 + col4 numeric col2 + col4 alpha:

#!/bin/ksh

isnumeric()
{
	result=$(echo "$1" | tr -d '[[:digit:]]')
	echo ${#result}
}

isalpha()
{
	result=$(echo "$1" | tr -d '[[:alpha:]]')
	echo ${#result}
}
col1=""
col2=""
col3=""
col4=""
let retval=1

while read record
do
	echo "$record" | awk -F"|" '{print $1, $2, $3, $4 }' | read col1 col2 col3 col4
	
	if [[ $(isnumeric "$col1") -eq 1 && $(isnumeric "$col4") -eq 1 ]]; then
		 retval=1
	else
		 retval=0
		 break
	fi

	if [[ $(isalpha "$col2") -eq 1 && $(isalpha "$col3") -eq 1 ]]; then
		 retval=1
	else
		 retval=0
		 break
	fi
done  < filename

if [[ $retval -eq 1 ]]; then
	echo "test is okay"
else
	echo "test failed for this row:"
	echo "$col1 $col2 $col3 $col4"
fi

For char

awk -F"|" ' $2~ "^[a-zA-Z][a-z[A-Z]*$" { print $2 }' file

if it contains special char then

awk -F"|" ' $2~ "^[^0-9][^0-9]*$" { print $2 }' file

For number

awk -F"|" ' $1 ~ "^[0-9][0-9]*$" { print $1 }' file

Anbu & Jim thanks a lot for the clue
Can i make this $1 or $2 as variable as for some file $1 can be numeric but for other file the first column constrint could be character
so depend upon the constrint file i want to check the main file.

Thanks for the clue

Python alternative:

>>> for lines in open("clientname_filename.csv"):
... 	lines = lines.strip().split("|")
... 	if not lines[0].isdigit() or not lines[3].isdigit():
...  		print "First or  fourth col not numeric: %s" %(lines)
... 	elif not lines[1].isalnum():
... 		print "Second col not alphanumeric: %s" %(lines)
... 	elif not lines[2].isalpha():
... 		print "Third col not char type: %s" %(lines)

try this

awk -F"|" ' 
BEGIN {
getline var < "constraintfile"
getline var < "constraintfile" 
n = split( var , arr , "|" )
for ( i = 1 ; i <= n ; ++i )
{
        if( arr == "Nu" )
                regexp = "^[0-9][0-9]*$"
        if( arr == "char" )
                regexp = "^[a-zA-Z][a-z[A-Z]*$"
print regexp
}
}
$1 ~ regexp[1] && $2 ~ regexp[2] && $3 ~ regexp[3] && $4 ~ regexp[4] {
print $0
}' file

Anbu,

Thanks a lot for being with me and helping me.
can you please tell me why we ned twice

getline var < "constraintfile"
getline var < "constraintfile"

??
Also for sample purpose i have given 4 column now but in actual we can have max 150 column.
how to deal if the number column is more then 30 .

awk -F"|" ' $1 ~ "^[0-9][0-9]*$" { print $1 }' file

this line work fine for all the test value like
23232@@
ww232323
2322%.%
3434*
-----
but why 4343$$44 is showing as numeric value ???
also 343$34
show as numeric and showing 3434

This expression is working for all other but not when the value contain $
can any one help ??

also i tried with echo $var | sed -n '/^[0-9][0-9]*$/p' same thing happened
any thing consist of 0-9 should be considered as numeric , this sed expression working fine except for the value 445$$343 or 23$899 some thing like this which contain $ , it is considering that as numeric .

I have taken the following

as constraint file. We need the second line right

It should work fine. Its working fine for me. I dont know whats happening in your case


  1. 0-9 ↩︎

  2. 0-9 ↩︎

try this

awk -F"|" ' 
BEGIN {
getline var < "constraintfile"
getline var < "constraintfile"
n = split( var , arr , "|" )
for ( i = 1 ; i <= n ; ++i )
{
        if( arr == "Nu" )
                regexp = "^[0-9][0-9]*$"
        if( arr == "char" )
                regexp = "^[a-zA-Z][a-zA-Z]*$"
print regexp
}
}
#$1 ~ regexp[1] && $2 ~ regexp[2] && $3 ~ regexp[3] && $4 ~ regexp[4] {
#print $0
{
for( i = 1 ; i <= NF ; ++i )
{
        if ( ! match ( $i , regexp ) )
                break;
        if ( i == NF )
                print $0
}
}' file

"^[0-9][0-9]*$" with regular expression to check for numeric is failed as it is accepting 4545$22 as numeric aslo 34343&232 is also as numeric
i want to accept 343.343 as numeric and 343343 as numeric other then that are non numeric.
I mean only decimal place and pure numeric form are considerd as numeric .
any special char inside is non numeric

To accept 343.343 as numeric use this

"[2][0-9.]*$"


  1. 0-9 ↩︎

  2. 0-9. ↩︎