awk - how to compare part of the string?

vegasluxor · October 11, 2013, 7:27am

Need help for awk..
file will have comma separated numbers, I need check digits before 10 numbers eg ( 001)1234567890
Basically want to check country code of a mobile number.

eg:

abc,def,data, data,0011234567890, data,data

Script should be checking country code with 001, I will pass country code in a variable.
Thanks!

Subbeh · October 11, 2013, 7:36am

awk -F, -v c=001 'substr($5,0,3)==c' file

This will print the line if the first three numbers of the 5th field are 001.

vegasluxor · October 11, 2013, 7:54am

Thanks!
But how do I find 3 digits before 10 digits from right ? Number could be greater or less than 10... if greater it might have leading multiple zeros.. So need to compare from the right side of the string.

eg: number 1231234567890
country (123)1234567890

eg: number 00000000121234567890
country 012

eg number 12345
country : nothing will be there to compare as number is not 10 digit

Please help !

Subbeh · October 11, 2013, 8:36am

try this:

awk -F, -v c=001 'substr($5,length($5)-12,3)==c' file

vegasluxor · October 14, 2013, 3:35am

Thanks a lot ! It worked !!!

ctsgnb · October 14, 2013, 6:18am

even shorter :

awk -F, -v c=001 '$5~c"..........$"' file

Note that this code does not check wheter the 10 characters that are on the right of c are digit or not , but neither does the code provided by Subbeh

CarloM · October 14, 2013, 12:37pm

$ cat file
abc,def,data, data,0011234567890, data,data
abc,def,data, data,00112345678901, data,data
abc,def,data, data,001123456789, data,data
abc,def,data, data,001123456789x, data,data
abc,def,data, data,0021234567890, data,data
$ awk -F, -v c=001 '$5~"^"c"[[:digit:]]{10}$"' file
abc,def,data, data,0011234567890, data,data

(GNU awk on cygwin)

ctsgnb · October 14, 2013, 12:56pm

Doesn't work on my VM ubuntu ...

$ cat tst2
abc,def,data, data,0011234567890, data,data
abc,def,data, data,00112345678901, data,data
abc,def,data, data,001123456789, data,data
abc,def,data, data,001123456789x, data,data
abc,def,data, data,0021234567890, data,data
$ awk -F, -v c=001 '$5~"^"c"[0-9]{10}"' tst2
$ awk -F, -v c=001 '$5~"^"c"..........$"' tst2
abc,def,data, data,0011234567890, data,data
abc,def,data, data,001123456789x, data,data
$ uname -a
Linux <anonymized> 2.6.38-8-generic #42-Ubuntu SMP Mon Apr 11 03:31:50 UTC 2011 i686 i686 i386 GNU/Linux

That one works

$ awk -F, -v c=001 '$5~"^"c"..........$"&&$5!~/[^0-9]/' tst2
abc,def,data, data,0011234567890, data,data
$

CarloM · October 14, 2013, 1:24pm

ERE/BRE, maybe? How about

$ awk -F, -v c=001 '$5~"^"c"[[:digit:]]\{10\}$"' file
awk: cmd. line:1: warning: escape sequence `\{' treated as plain `{'
awk: cmd. line:1: warning: escape sequence `\}' treated as plain `}'
abc,def,data, data,0011234567890, data,data

(which my GNU awk 4.1.0 doesn't like much)

Or the looong version, of course :).

$ awk -F, -v c=001 '$5~"^"c"[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]$"' file
abc,def,data, data,0011234567890, data,data

ctsgnb · October 14, 2013, 1:30pm

Of course the long version works fine
( i also thought about it earlier but it was so long, i had preferred to put the one with &&$5!~/[^0-9]/

$ awk -F, -v c=001 '$5~"^"c"[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]$"' tst2
abc,def,data, data,0011234567890, data,data
$

But the backslashed short one still doesn't :

$ awk -F, -v c=001 '$5~"^"c"[[:digit:]]\{10\}$"' tst2
$

CarloM · October 14, 2013, 2:10pm

Try adding --re-interval to the flags.