merging two files where countries are repeated

I have gone through various posts in this forum but could not locate where
the repeated columns are there .Hence posted this post.

File 1(colon seperated file)

Abc : 1234 : London
Def : 2456 : Paris
Efg :1456 : Newyork
Fgh : 1324 : Tokyo
Hjk  : 5829 : California 
Lmn : 7823 : Chicao
Klm : 6472 : Munich
Jkm : 8231 : Franfurt
Acl : 3241 : Hiroshima
Jmp : 2385 : Detroit
Lqm : 4613 : Kansas

File 2(coma seperated file)

USA , Newyork
USA , California
USA , Sanfransisco
USA , Detroit
USA , Chicago
USA , Kansas
USA , St.Louise
USA , Florida
UK , London
FRANCE , Paris
FRANCE , Marsellie.
INDIA , NewDelhi
INDIA , Mumbai
INDIA , Calcutta
INDIA , Chennai
INDIA , Bangalore
JAPAN , Tokyo
JAPAN , Nagasaki
JAPAN , Hirosima
CANADA , Ottawa
CANADA , Montreal
CANADA , Vancuvor
ARGENTINA , Buenos Aires
BRAZIL , Brassila
BRAZIL , Sao Paulo
ITALY , Rome
NORVEY , Oslo
SPAIN , Madrid
GERMANY , Frankfurt
GERMANY , Munich,
USSR , Moscow
USSR , Laningrad
USSR , Vladvostok

Output File expected as

Abc , 1234 , London ,UK
Def  , 2456 , Paris , France
Efg  , 3456 , Newyork , USA
Fgh , 1324 , Tokyo , Japan
Hjk  , 5829 , California , USA
Lmn , 7823 , Chicago, USA
Klm , 6472 , Munich , Germany
Jkm , 8231 , Frankfurt , Germany
Acl  , 3241 , Hiroshima , Japan
Jmp , 2385 , Detroit , USA
Lqm , 4613 , Kansas , USA

So far I have used join command but here it is not working as the cities are under the label countries.

Result required is Display the name of the country against the name of the city .
Tried to use

awk -F":," 'NR==FNR{ key[$1,$2]=$2;next } $1[key] { print $1,$2,$3} ' "OFS=,"  file2 file1 >outputfile

Thanks in advance.

awk -F'[:,]' 'NR==FNR{key[$2]=$1;next}{print $1,$2,$3," "key[$3]}' OFS=, file2 file1 >outputfile

Anurag ,
Thanks lot for your reply but it is not working , only the 1st instance is
shown and not all the COUNTRIES names are mentioned.I have tested your
code today and found the above things.
i.e. The output shows as :
OUTPUT FILE

Abc , 1234 , London ,UK
Def , 2456 , Paris , France
Efg , 3456 , Newyork , USA
Fgh , 1324 , Tokyo , Japan
Hjk , 5829 , California ?
Lmn , 7823 , Chicago ?
Klm , 6472 , Munich , Germany
Jkm , 8231 , Frankfurt ?
Acl , 3241 , Hiroshima ?
Jmp , 2385 , Detroit ?
Lqm , 4613 , Kansas ?

That is the 1st instance of country will be displayed or joined and not in subsequent events ??

I hope I have made things clear .

It is not working because your data doesn't match.

Brownie points for you if you could find the mismatch in each case yourself.

$
$ # display the contents of file2
$
$ cat file2
USA , Newyork
USA , California
USA , Sanfransisco
USA , Detroit
USA , Chicago
USA , Kansas
USA , St.Louise
USA , Florida
UK , London
FRANCE , Paris
FRANCE , Marsellie.
INDIA , NewDelhi
INDIA , Mumbai
INDIA , Calcutta
INDIA , Chennai
INDIA , Bangalore
JAPAN , Tokyo
JAPAN , Nagasaki
JAPAN , Hirosima
CANADA , Ottawa
CANADA , Montreal
CANADA , Vancuvor
ARGENTINA , Buenos Aires
BRAZIL , Brassila
BRAZIL , Sao Paulo
ITALY , Rome
NORVEY , Oslo
SPAIN , Madrid
GERMANY , Frankfurt
GERMANY , Munich,
USSR , Moscow
USSR , Laningrad
USSR , Vladvostok
$
$ # display the contents of file1
$
$ cat file1
Abc : 1234 : London
Def : 2456 : Paris
Efg :1456 : Newyork
Fgh : 1324 : Tokyo
Hjk  : 5829 : California
Lmn : 7823 : Chicao
Klm : 6472 : Munich
Jkm : 8231 : Franfurt
Acl : 3241 : Hiroshima
Jmp : 2385 : Detroit
Lqm : 4613 : Kansas
$
$ # run a Perl one-liner joining city to country
$
$ perl -lne 'if ($ARGV eq "file2"){/^(\w+)\s*,\s*(\w+)$/; $x{$2}=$1} else {/^.*:\s*(\w+)$/; $_="$_ , $x{$1}"; s/:/,/g; print}' file2 file1
Abc , 1234 , London , UK
Def , 2456 , Paris , FRANCE
Efg ,1456 , Newyork , USA
Fgh , 1324 , Tokyo , JAPAN
Hjk  , 5829 , California  ,
Lmn , 7823 , Chicao ,
Klm , 6472 , Munich ,
Jkm , 8231 , Franfurt ,
Acl , 3241 , Hiroshima ,
Jmp , 2385 , Detroit , USA
Lqm , 4613 , Kansas , USA
$
$

tyler_durden

Try this,

awk -F"[:,]" 'NR==FNR{a[$2]=$1;next;} a[$3] { print $1,$2,$3,a[$3]}' file2 OFS="," file1
1 Like

some updates base on pravin27's code, to fix the problem if there are different spaces or tabs before and after city and country name.

awk -F"[:,]" '
function s(a) {gsub(/^[\t ]*/,"",a);gsub(/[\t ]*$/,"",a);return a} 
NR==FNR{a[s($2)]=s($1);next;} a[s($3)] { print  $1,$2,$3 " , " a[s($3)]}
' file2 OFS="," file1
1 Like

Hi
Pravin27 and rdcwayx,
Thanks lot to both you for the code and working perfectly as per my requirement
,One request , can you explain in details the logic for a[$3] ? in details ?
Is it a loop ? Plese if you can explain the code in details ,I really appreciate
your help anyway .

A BIG THANKS ONCE AGAIN FOR THE CODE

With due regards and respect

Vakharia M J :slight_smile:

To thank in this forum is, click the Thanks Thumb.

a[$3] is used to build an array a in awk.

1 Like

Hi rdcwayx
Sorry I did not click the Thumb impression ,now corrected and I appreciate your help but
the moot question was the a[3] will have the value of 3rd field of File1 or file2 ?
secondly What I wanted to know is
(1) should we use which file 1st and why
(2) how fields i.e. the common on which we want to join or likewise and use them in array ?
(3) In case of more instancies in my case , loooping is necessary ?
Can you guide me rdcwayx please
Thanks in advance,

awk -F"[:,]" '
NR==FNR{a[$2]=$1;next;}   # (1) It only builds array a for file2 (that's the first file in awk command) , NR==FNR is used to control it. 
a[$3]  # (2) then check if we have the same value in arrary a a[$3]'s value is not 0 or null. 
{ print $1,$2,$3,a[$3]}' file2  # First file
 OFS="," file1  # second file

For the third question, I am not really understand. Do you mean you need update more files with country and city, then just add the file name one by one.

awk .....  file2 OFS="," file1 file3 file4 file5
1 Like

So kind of you rdcwayx ,wish you nice weekend .