Script for Country Codes

zanetti321 · March 31, 2008, 6:01am

Dear All

I have a file which contains lines looks like this:

ISC Egypt-Alex2 126 104541338 218926893238 f 1B

ISC BT-Colindale 26 249126190534 218913486850 b 29

ISC Egypt-Cairo2 199 129026052 218927661509 b 26

As you see in each line $4 and $5 are phone numbers , i want a script which process this file and its output should be like this :

ISC Egypt-Alex2 126 104541338 218926893238 f 1B Egypt-Vodafone Libya-Libyana

ISC BT-Colindale 26 249126190534 218913486850 b 29 ZAIN Sudan Libya-Madar

ISC Egypt-Cairo2 199 129026052 218927661509 b 26 Mobinil-Egypt Libya-Libyana

As you see there i need two fields to be added according the country code of the phone numbers

For sure there are another country codes so please can any one advise how could it be done on some country codes as example and i will add the others by analogy

Thanks and waiting
Zanetti321

zanetti321 · March 31, 2008, 6:04am

Adding to my last request the file containing this line contains about 300000 lines so the script should be somehow fast.

Thanks
Zanetti

era · March 31, 2008, 2:50pm

Not going to happen until you tell us where we can obtain the information that you want to be added to each line.

If you need it to be fast, it sounds like you might want to port all of this to an SQL database, depending somewhat on how often you will need to do this processing.

zanetti321 · March 31, 2008, 8:01pm

Dear Era

The Information needed will be a simple file for example list of country codes and the corresponding country for example:

Egypt-Vodafone 2010
Egypt-Mobinil- 2012
Vodafone-UK 447
ZAIN-Sudan 24912
Libya-Libyana 21892

So the script should add the two new fields with the names of countries according to code of $4 and $5

i can prepare to you a file and send it to you if this is the information you need , if else please update me

Regarding the speed i wont prefer to load the data to my SQL because it will waste time for me so please can you have subsitute solution after sending you the required file you need.

Thanks
Zanetti

era · April 1, 2008, 2:07am

You want speed, but loading it up into a faster engine is a waste of your time? Well, your call.

Load the prefix code to operator mapping into a hash and the rest will be trivial. Prepare to learn some Perl if you don't know it already.

#!/usr/bin/perl

use warnings;
use strict;

my %mapping;

open (F, "mappings.txt") || die "$0: could not open mappings.txt: $!\n";
while (<F>) {
  chomp;
  my ($op, $code) = m/(.*\S)\s+(\d+)$/;
  $mapping{$code} = $op;
}
close F;

my $keys = join ("|", keys %mapping);
my $r = qr/^($keys)/;

while (<>)
{
  my @F = split;
  if (defined $F[3] && $F[3] =~ $r) {
    print "$F[3] matches $1, maps to $mapping{$1}\n";
  }
  if (defined $F[4] && $F[4] =~ $r) {
    print "$F[4] matches $1, maps to $mapping{$1}\n";
  }
}

This is only very briefly tested, proof of concept code. The output is not what you want, but I trust you can glue things together (by trial and error, if not otherwise). Google for a quick Perl intro if you are not familiar with the language, there are tons of those.

era · April 1, 2008, 2:16am

In case it's not painfully obvious, it expects the operator to mapping table in a text file called mappings.txt in the current directory, and reads standard input. Here is a demo run.

vnix$ ls -l map mappings.txt
-rw-r--r-- 1 era era 516 2008-04-01 09:06 map
-rw-r--r-- 1 era era  93 2008-04-01 09:02 mappings.txt

vnix$ perl ./map <<HERE
> ISC Egypt-Alex2 126 104541338 218926893238 f 1B
> ISC BT-Colindale 26 249126190534 218913486850 b 29
> ISC Egypt-Cairo2 199 129026052 218927661509 b 26
> HERE
218926893238 matches 21892, maps to Libya-Libyana
249126190534 matches 24912, maps to ZAIN-Sudan
218927661509 matches 21892, maps to Libya-Libyana

vnix$ cat mappings.txt
Egypt-Vodafone 2010
Egypt-Mobinil- 2012
Vodafone-UK 447
ZAIN-Sudan 24912
Libya-Libyana 21892

zanetti321 · April 1, 2008, 10:32am

Dear Era

Thanks for your reply; just i want to check on a certain issue:

Regarding mapping.txt files it will contain large number of country codes ; so will this affect or it is ok?

Regarding your code what i understand that the first block is loading the mapping.txt into hash then the second block is reading my lines and matching it with the hash contents

If you please tell me the meaning of each line just to understand if this will disturb you so no problem.

Thanks for co-operation
Zanetti

unilover · April 1, 2008, 4:28pm

If you like, you can also do it with the following two lines:

sed 's=\(.*\) \(.*\)=awk :{if($4~/^\2/)print $0" \1";else print}: phonelines.txt=' country-codes.txt|tr ':' "'"|sh|awk 'NF==8{print}' >newlines.txt
sed 's=\(.*\) \(.*\)=awk :{if($5~/^\2/)print $0" \1";else print}: newlines.txt=' country-codes.txt|tr ':' "'"|sh|awk 'NF==9{print}'

in which:

phonelines.txt is the file containing your data-lines and
country-codes.txt is the file containing the two-column phone-codes.

unilover · April 1, 2008, 4:51pm

Or, for a faster code, execute:

awk "{$(sed 's=\(.*\) \(.*\)=if($4~/^\2/)print $0\" \1\"=' country-codes.txt)}" phonelines.txt|awk "{$(sed 's=\(.*\) \(.*\)=if($5~/^\2/)print $0\" \1\"=' country-codes.txt)}"

with country-codes.txt containing:

Egypt-Vodafone 2010
Egypt-Mobinil 2012
Vodafone-UK 447
ZAIN-Sudan 24912
Libya-Madar 21891
Libya-Libyana 21892
Country10-City4 104
Country12-City9 129

and phonelines.txt having your 3 sample lines, the output will be:

ISC Egypt-Alex2 126 104541338 218926893238 f 1B Country10-City4 Libya-Libyana
ISC BT-Colindale 26 249126190534 218913486850 b 29 ZAIN-Sudan Libya-Madar
ISC Egypt-Cairo2 199 129026052 218927661509 b 26 Country12-City9 Libya-Libyana

zanetti321 · April 2, 2008, 10:35am

Dear Unilover

I would like to thank you very much for your code , i think it is the one that i want ,

Please can you tell me about the sed scripting you used , can you tell for example what /n.* make and so one for the code you created

Regarding speed can your code support up to for example 500,000 lines in short time or it will take much time?

Thnaks very much again
Zanetti

unilover · April 2, 2008, 11:44am

OK, this is what you have to do in order to understand how this code works:

Execute:

sed 's=\(.*\) \(.*\)=if($4~/^\2/)print $0\" \1\"=' country-codes.txt

It will convert each line of the "country-code" to an "if-statements" in awk-language that instructs awk that:

if the forth-field (of the read-in line) starts with the "number" then, print the entire line followed by the "name".

Thus, if you run awk on the phone-lines.txt file with those generated "if-statements" as the executing-commands on the read-in lines, the result will be "addition of the appropriate Country-City-Name to the end of the line, based on the starting number in the forth-field of the read-in line".

You can see it for yourself, by executing:

awk "{$(sed 's=\(.*\) \(.*\)=if($4~/^\2/)print $0\" \1\"=' country-codes.txt)}" phonelines.txt

Now, we have to add the "Second Country-City-Name to the end of the line, based on the starting number in the fifth-field of the read-in line".

We do this by piping the output of the above-command to another awk who has no phonelines.txt as its file-parameter (so, it will be forced to read the output of the previous awk in the pipe) and, as its set of "if-statements", the same sed is used but this time it is generating "$5~/^ ..." instead of "$4~/^ ..." to instruct (the second) awk that:

if the fifth-field (of the read-in line) starts with the "number" then, print the entire line followed by the "name".

Now, as for the speed, I'm afraid I don't have a 500000 line phonelines.txt file and more than that, it depends on the computer on which you're running this commands. But, one thing that I am very sure of is that both sed and awk are very fast in what they do.

As for the description of how the sed is generating the "if-statements", I would refer you to the description of "Regular Expression" either on the internet onin man-pages.

Good Luck;)

zanetti321 · April 3, 2008, 9:32am

Dear Unilover

Thanks for your code but i typed it the script you gave me the result is
Variable Syntax

So i think there may problem in Syntax

Please advise

Zanetti

unilover · April 3, 2008, 9:44am

I believe you should have made a mistake in typing. Please copy what you have typed and paste it here so that I can see where the mistake is.

zanetti321 · April 3, 2008, 11:02am

Dear Unilover
this what i have typed for the faster code:

geo@spiserver /home/geo/cdr/ItalyRel 25 > awk "{$(sed 's=$.$ $.$=if($4~/^\2/)print $0\" \1\"=' country-codes.txt)}" phonelines.txt|awk "{$(sed 's=$.$ $.$=if($5~/^\2/)print $0\" \1\"=' country-codes.txt)}"
Variable syntax

and this is the files in my directory

134 Apr 3 16:56 country-codes.txt
-rw-r--r-- 1 geo geo 0 Apr 3 16:57 final
-rw-r--r-- 1 geo geo 874008 Mar 31 17:50 italy2
-rw-r--r-- 1 geo geo 1151226 Apr 3 17:11 phonelines.txt
-rwxr-xr-x 1 geo geo 133 Apr 3 17:09 script*

unilover · April 3, 2008, 11:51am

This is very strange!! your command is exactly as it should be!!

Try the following and let me have the result:

uname -a
echo "{$(sed 's=\(.*\) \(.*\)=if($4~/^\2/)print $0\" \1\"=' country-codes.txt)}" >awk_4
head awk_4
tail awk_4
awk -f awk_4 phonelines.txt >first_added
head first_added
tail first_added
echo "{$(sed 's=\(.*\) \(.*\)=if($5~/^\2/)print $0\" \1\"=' country-codes.txt)}" >awk_5
head awk_5
tail awk_5
awk -f awk_5 first_added

zanetti321 · April 3, 2008, 1:41pm

Dear Unilover

I typed the commands you told me the result is :
geo@spiserver /home/geo/cdr/ItalyRel 5 > uname -a
echo "{$(sed 's=$.$ $.$=if($4~/^\2/)print $0\" \1\"=' country-codes.txt)}" >awk_4
head awk_4
tail awk_4
awk -f awk_4 phonelines.txt >first_added
head first_added
tail first_added
echo "{$(sed 's=$.$ $.$=if($5~/^\2/)print $0\" \1\"=' country-codes.txt)}" >awk_5
head awk_5
tail awk_5
awk -f awk_5 first_added
SunOS spiserver 5.8 Generic_108528-23 sun4u sparc SUNW,Sun-Fire-280R
Variable syntax

Same one variable Syntax ; Note that i am using Solaris 8 may be this cause problems!! but i don't think so

Anyway the first code you gave me before the faster code is working but i want to pipe the final result in a single which i can't do moreover the code is very slow

The most important that when i type the first command:
sed 's=$.$ $.$=if($4~/^\2/)print $0\" \1\"=' country-codes.txt

it generates the if loops as you said but when i type this code alone:
awk "{$(sed 's=$.$ $.$=if($4~/^\2/)print $0\" \1\"=' country-codes.txt)}" phonelines.txt

The result is variable sysntax

Please advise

era · April 4, 2008, 2:42am

If it's a really old shell it might not understand $(...)

zanetti: Does $(echo echo moo) print "moo" for you?

Does echo "{moo}" print "{moo}" for you?

Does echo "{$(echo moo)}" print "{moo}" for you?

vnix$ $(echo echo moo)
moo
vnix$ echo "{moo}"
{moo}
vnix$ echo "{$(echo moo)}"
{moo}

unilover · April 4, 2008, 9:39am

OK, looks like era is righ. Please try this:

echo "{`sed 's=\(.*\) \(.*\)=if($4~/^\2/)print $0\" \1\"=' country-codes.txt`}" >awk_4
head awk_4
tail awk_4
awk -f awk_4 phonelines.txt >first_added
head first_added
tail first_added
echo "{`sed 's=\(.*\) \(.*\)=if($5~/^\2/)print $0\" \1\"=' country-codes.txt`}" >awk_5
head awk_5
tail awk_5
awk -f awk_5 first_added

and then copy and paste whatever is displayed.

real_mc · April 4, 2008, 10:56am

It works very well

zanetti321 · April 4, 2008, 12:09pm

I am not right now in location where i can execute but i wanna ask question:

Last Reply mentioned by Unilover i want the final out put to be directed in one file so will this script make this ?

Regarding speed ? does it seems good ?

Thanks alot
zanetti