Need HELP with AWK split. Need to check for "special characters" in string before splitting the file

shell_boy23 · July 30, 2012, 5:42am

Hi Experts.

I'm stuck with the below AWK code where i'm trying to move the records containing any special characters in the last field to a bad file.

awk -F, '{if ($NF ~ /^[0-9]|^[A-Za-z]/) print >"goodfile";else print >"badfile"}' filename

sample data

1,abc,def,1234,A *
2,bed,dec,342,* A         
3,dec,345,23,*&^          
4,sdf,fgh,234,  
5,ert,345,ghj,C**
6,ert,345,sdf,123          ---- only valid record

The output required must contain the first 5 records in badfile and the last record in good file.
But my above awk logic cosiders only the below records as badfile records:

2,bed,dec,342,* A 
3,dec,345,23,*&^ 
4,sdf,fgh,234,

The other two invalid records ("A " and "C*") are being written into goodfile which is wrong.
Please help me fix this.

Note: the $NF values can contain [spaces:] between any alphanumeric chars. However, all spaces or null is considered a bad record.

Thanks Gurus!

Klashxx · July 30, 2012, 5:50am

Tune your regexp to:

awk ' /[0-9]$|[A-Za-z]$/{print >"goodfile";next}{print >"badfile"}' infile

shell_boy23 · July 30, 2012, 6:14am

I tuned the regex as suggested. But it does not give the required output.
code used:

awk -F, '{if ($NF ~ /[0-9]$|[A-Za-z]$/) print >"goodfile"; else print >"badfile"}' samp.txt

samp.txt

1,abc,def,1234,A *
2,bed,dec,342,* A
3,dec,345,23,*&^
4,sdf,fgh,234,
5,ert,345,ghj,C*2
6,ert,345,sdf,123

Output

$ cat goodfile
2,bed,dec,342,* A
5,ert,345,ghj,C*2
6,ert,345,sdf,123
$ cat badfile
1,abc,def,1234,A *
3,dec,345,23,*&^
4,sdf,fgh,234,

Klashxx · July 30, 2012, 6:21am

Check your syntax:

awk -F, '{if ($NF ~ /[0-9]$|[A-Za-z]$/) {print >"goodfile"} else {print >"badfile" }}'  infile

shell_boy23 · July 30, 2012, 6:29am

Sorry, Klashxx.
It gives the same output.

$ cat samp.txt
1,abc,def,1234,A *
2,bed,dec,342,* A
3,dec,345,23,*&^
4,sdf,fgh,234,
5,ert,345,ghj,C*2
6,ert,345,sdf,123

$ awk -F, '{if ($NF ~ /[0-9]$|[A-Za-z]$/) {print >"goodfile"} else {print >"badfile" }}' samp.txt

$ cat goodfile
2,bed,dec,342,* A
5,ert,345,ghj,C*2
6,ert,345,sdf,123

$ cat badfile
1,abc,def,1234,A *
3,dec,345,23,*&^
4,sdf,fgh,234,

---------- Post updated at 05:29 AM ---------- Previous update was at 05:28 AM ----------

I'm using korn shell. Maybe that makes a difference .??

RudiC · July 30, 2012, 6:49am

This only looks at $NF's last char. This

awk -F, '{if ($NF~/[0-9A-Za-z][0-9A-Za-z][0-9A-Za-z]/) print >"goodfile"; else print>"badfile}' samp.txt

will work on the example, but it does not take into account the possible variable length of $NF. The repetition term /.../{length($NF)} does not seem to work, nor does the regex [[:alnum:]] contruct.

elixir_sinari · July 30, 2012, 6:52am

Try this:

awk -F, '{if($NF~/^[[:alnum:][:blank:]]+$/) print > "goodfile"; else print > "badfile"}' infile