awk + pattern search with regular expression

Hi ,

I have a file with "|" (pipe) as a delimeter. I am looking for the record count where 5th field is a number with 15 digit length only.

all the records with above requirement is valid rest all are invalid. I need count of valid records and invalid records.

Can anyone please help

awk -F\| '$5 ~ /^[0-9]{15}$/ { C++ }
END { print "Correct:  " C+0; print "Wrong: " NR - C }
' input_file

Thanks for quick response.

But i have already tried similar thing but its still giving all the records as invalid which are exactly not.

Hi.

Please post a sample of your input data.

Thanks.

251|239|07807837304|234332108514613|UNKNOWN|0|447981296445            |00001000|5|21||4040|4700|0|0|0|2010-02-16 23:36:35|47.0|47.0|000000|2|871|871|0|0|ap-prepaidsessio|0|0000000000000|2010-02-16 23:36:35|0|0|0|0|0|2010-02-16 23:54:01|815602|0|67123384|A|N|||N||0||0.0||0|||58409557|Y|Y|0|3845856|
30|239|07890004717|23433|UNKNOWN|0|447842654598            |000020120|1949|0|||||0|0|2010-02-16 23:38:48|0.0|0.0|002100|2|91|91|0|0|447842654598|0|0000000000000|2010-02-16 23:38:48|0|0|4108|0|0|2010-02-16 23:54:01|815602|0|23221133|A|N|||N||0||0.0||0|||23221022|Y|Y|0|3845856|
251|239|07800719062|234339811447329|UNKNOWN|0|447981296445            |00001000|76|21||5040|9300|0|0|0|2010-02-16 23:36:39|93.0|93.0|000000|80|871|871|0|0|ap-prepaidsessio|0|0000000000000|2010-02-16 23:36:39|0|0|0|0|0|2010-02-16 23:54:01|815602|0|31215218|A|N|||N||0||0.0||0|||31128259|Y|Y|0|3845856|
30|239|07581019247|23433|UNKNOWN|0|447973018951            |000020100|380|0|||||0|0|2010-02-16 23:38:48|0.0|0.0|002100|2|90|90|0|0|447975993569|0|0000000000000|2010-02-16 23:38:48|0|0|4108|0|0|2010-02-16 23:54:01|815602|0|68486442|A|N|||N||0||0.0||0|||59745744|Y|Y|0|3845856|
30|239|07929481292|23433|UNKNOWN|0|447846375536            |00002000|15|0||2073|||0|0|2010-02-16 23:38:49|0.0|0.0|002100|2|91|91|0|0|447846375536|0|0000000000000|2010-02-16 23:38:49|0|0|4108|0|0|2010-02-16 23:54:01|815602|0|67432661|A|N|||N||0||0.0||0|||58712609|Y|Y|0|3845856|
1|239|07896681517|234334905955530|355176032676650|134|0447503788274|000010600|926|21|||180|60|0|0|2010-02-16 23:36:01|180.0|180.0|001100|2|3|3|0|0|447503788274|0|0000000000000|2010-02-16 23:36:01|0|0|0|6|0|2010-02-16 23:54:01|815602|0|63760018|A|N|||N||0||0.0||0|||55147472|Y|Y|0|3845856|
1|239|07970459763|234332007934385|35686402900035|134|0447973015951|000010200|52|21|||60|20|0|0|2010-02-16 23:38:57|60.0|60.0|001100|2|2|2|0|0|447989702325|0|0000000000000|2010-02-16 23:38:57|0|0|0|6|0|2010-02-16 23:54:01|815602|0|66517576|A|N|||N||0||0.0||0|||57818716|Y|Y|0|3845856|
1|239|07875208615|234334109364652|357097008573710|134|0447973018951|00001000|34|21||2094|60|0|0|0|2010-02-16 23:39:01|60.0|60.0|001100|2|2|2|0|0|447896731221|0|0000000000000|2010-02-16 23:39:01|0|0|0|6|0|2010-02-16 23:54:01|815602|0|35761130|A|N|||N||0||0.0||0|||35551924|Y|Y|0|3845856|
1|239|07870669563|234332003666468|35557602905962|134|0353834026141|000010800|324|21|||240|80|0|0|2010-02-16 23:35:25|240.0|240.0|001100|2|800|800|0|0|353834026141|0|0000000000000|2010-02-16 23:35:25|0|0|0|6|0|2010-02-16 23:54:01|815602|0|63698172|A|N|||N||0||0.0||0|||55087027|Y|Y|0|3845856|
1|239|07792884405|234334403667880|358568035578590|134|0447973014951|000010200|432|21|1|-2|0|20|0|0|2010-02-16 23:27:51|720.0|720.0|001100|2|60|60|0|0|447964118503|0|0000000000000|2010-02-16 23:27:51|0|0|0|6|0|2010-02-16 23:54:01|815602|0|30812176|A|N|||N||0||0.0||0|||30731698|Y|Y|0|3845856|

Out of the above record nly 3 records are valid (highlighted in RED.)

Please help

Hi.

Based on your input, the output I get is:

$ ./Test
Correct:  3
Wrong: 7

Please find below the code and output :

awk -F\| '$5 ~ /^[0-9]{15}$/ { C++ }
END { print "Correct:  " C+0; print "Wrong: " NR - C }
> END { print "Correct:  " C+0; print "Wrong: " NR - C }
> ' aa
Correct:  0
Wrong: 10

I am working on solaris box, is it an issue ?

Probably :slight_smile:

Try using nawk, or /usr/xpg4/bin/awk...

---------- Post updated at 04:04 PM ---------- Previous update was at 03:57 PM ----------

OK, just fired up my old Solaris work horse (well, more of a lame three-legged donkey, actually)..

And it gives the right output with /usr/xpg4/bin/awk, but doesn't work with nawk.

using perl:-

perl -wlanF'\|' -e '$F[4] =~ /\d{15}/ and $c++ ;
END{printf "Correct = %s\nWrong = %s\n",$c,$.-$c};'  infile.txt

:cool::cool::wink:

Thanks a lot Scott,

It is working fine now :slight_smile: