Checking for certain characters

SAMZ · July 17, 2008, 6:22am

Could anyone help with the following enquiry.. i have a file in the following format:

ID .... VALUE
A001 .... 100
B002 .... 200
A004 .... 300
B006 .... 100
A997 .... 200
B776 .... 400

It is in a column format, but I want to check that the ID field always begins with with and A or B character this is my logic thus far:

If Character 1 DOES NOT equal A or B
then
display error message
else
carry on doing what you want
fi

not really sure how to chech that character 1 of each line does not equal A or B
habe tried following with no no joi

If [ ! grep '^A' $file || ! grep '^B' $file ] then
print "error"
else
print "it works"
fi

but the above does not work i believe it to logic as my unix understanding not great, please assist.

phemanth24 · July 17, 2008, 6:49am

Try this:

/(^A)|(^B)/ {num++}
END {if(num > 0);printf("%d instances of A,B exist\n", num);}

You can modify the output the way you want.

SAMZ · July 17, 2008, 7:12am

can't seem to get the above to work could you explain further please

phemanth24 · July 17, 2008, 7:17am

Hi Samz. I should have elaborated further.
I put my code into a script and ran it with awk.

awk -f <script name> <file>

Currently it prints the number of instances of A and B

phemanth24 · July 17, 2008, 7:21am

BTW, you could also bypass putting this into a script.

awk '/(^A)|(^B)/ {num++} END {if(num > 0);printf("%d instances of A,B exist\n", num);}' column

'column' is the file where I have the format you specified.

Diabolist · July 17, 2008, 7:25am

The input file:

$ cat ttt
ID .... VALUE
-------------
A001 .... 100
C003 .... 800
B002 .... 200
corrupt
data
A004 .... 300
C003 .... 800
foo .... bar

The script:

#!/bin/ksh

INPUT=ttt

{ while read LINE
do
  echo $LINE |egrep "^A|^B" > /dev/null 2>&1
  if [ $? -eq 0 ]
  then
    echo "Processing $LINE"
  else
    echo "Skipping $LINE"
  fi
done } < $INPUT

The output:

$ ./ttt.ksh
Skipping ID .... VALUE
Skipping -------------
Processing A001 .... 100
Skipping C003 .... 800
Processing B002 .... 200
Skipping corrupt
Skipping data
Processing A004 .... 300
Skipping C003 .... 800
Skipping foo .... bar

You could do a single string of commands using awk for the pattern matching, but I'm not sure how you want to process the line once you verify it's good... so this may offer the most flexibility.

Let us know if you need anything in the script explained.

SAMZ · July 17, 2008, 8:26am

phemanth24:

BTW, you could also bypass putting this into a script.
awk '/(^A)|(^B)/ {num++} END {if(num > 0);printf("%d instances of A,B exist\n", num);}' column
'column' is the file where I have the format you specified.

Ok the above only tell me how many times A or B where in there. I require it to send an error message if a C exist or otherwise its fine to continue processing file.. hope that makes more sense

SAMZ · July 17, 2008, 11:16am

anyone out there willing to help a lost man

fsahog · July 17, 2008, 8:16pm

That shell script above - you see where it displays "error" - if you make the script in a file with the execute bit set, you can run it. Use the portion that shows an error, i.e. "Skipping line" - and say, for example, "exit 1". Then at the end, assuming it didn't hit there, say "exit 0". Remove the printing where it's OK on A/B. Then your script exit value is zero/non-zero and you can use it as a test. Have it's input be from standard input, and it can "filter" for you. I realize this might be lots of theory and not enough specifics. If so, I could make it so and give you exact examples. Other folks here could too, I imagine. If you need it and no one has by the time I get back here, I'll do it.

matrixmadhan · July 20, 2008, 12:42am

awk 'NR > 1 { if( /^A/ || /^B/ ) { printf "%s - Correct - Carry on with what you are doing!\n", $0 } else { printf "Error\n" } }' filename

SAMZ · July 21, 2008, 4:59am

Thanks for all the above help, but let me clarify the need of what is required to see if anyone can assist further.
Currently as stated i have teh following file called TEST1.CSV

ID ,, VALUE
A001 ,, 100
B002 ,, 200
A004 ,, 300
B006 ,, 100
A997 ,, 200
B776 ,, 400

This file needs to be processed but only if all the ID's are correct (that means only if the IDs begin with an A - B charcter.
If the ID begins with any character other A or B the process needs to exit out the script with error message.

What i would like is the following:

IF [begining of row is not A or B] THEN
print error message
exit 1
END IF

No else required as if the only A and B the IF statement will end succesfully leading the process scripting afterwards.
So any thoughts, not really as complex as previously thought???

SAMZ · July 21, 2008, 7:17am

OK i have found this to be teh best solution thus far, anyone with any better thoughts

if [ `cat TEST1.CSV | egrep -v '^A|^B' | wc -l` -gt 0 ]
then
echo "error message"
exit 1
fi

bakunin · July 21, 2008, 7:34am

SAMZ, until now you didn't tell us what you really want to achieve. The logic you used is ok, but - depending on what you want to do, the part inside "carry on doing what you want" - there may or may not be better solutions. We are able to help you only if we know exactly what you want to achieve.

Maybe your way (that is: to do it in shell script language) is good, because the shell is best suited to what you want to achieve. The other solutions (awk, sed, ...) may or may not be better suited than your solution because the respective tools may or may not be better suited to what you want to do - to finally assess this question we would need to know what you want to do.

I hope this helps.

bakunin

Franklin52 · July 21, 2008, 7:40am

Perhaps something like this?

egrep -v '^A|^B' TEST1.CSV > /dev/null 2>&1 || echo "error message"

Regards