Delete file2.txt from file1.txt using scripting

beanbaby · November 22, 2012, 12:56am

Hi,

I`m a total newbie, well my requirement is that i have 2 files

I want to identify which countries i do not currently have in db..

how can i use the grep or another command to find this file ..

i want to match all-countries.txt with countries-in-db.txt so the output is equal to countries which are not currently in my db?

any command that i can use to achieve this would be helpful

Thanks

ss81 · November 22, 2012, 1:08am

Hi Beanbaby,

I think you can use diff file1.txt file2.txt. It will show you the differnece between the two files.

itkamaraj · November 22, 2012, 1:08am

$ cat db.txt
US
SG
AU
$ cat country.txt 
US
IN
GB
JP
SG
AU
$ awk 'NR==FNR{a[$0]++;next}{if (!($1 in a))print}' db.txt country.txt
IN
GB
JP

beanbaby · November 22, 2012, 1:28am

Thank You so much...

However my db has 127 countries... and thewholeworld db has 260 countries.. the result from the above command

awk 'NR==FNR{a[$0]++;next}{if (!($1 in a))print}' db.txt country.txt

IN is 219.

Should it not be close to 130 (the countries that are not listed in db.txt?

Thanks!

itkamaraj · November 22, 2012, 1:32am

post some contents of your files and expected output.

---------- Post updated at 12:02 PM ---------- Previous update was at 12:00 PM ----------

if your country name has something like "Saudi Arabia" ( with space ), then you need to try out this

awk 'NR==FNR{a[$0]++;next}{if (!($0 in a))print}' db.txt country.txt

beanbaby · November 22, 2012, 1:40am

Hello,

Yes there are countries like 2 spaces and 3 spaces, like the example below

allcountries.txt

Uganda
United Arab Emirates
United Kingdom
United States
Uruguay

db.txt

Uganda
United States
Uruguay

result.txt

should have countries not listed in my database like the below

United Arab Emirates
United Kingdom

Many Thanks!

itkamaraj · November 22, 2012, 1:43am

 
 $ awk 'NR==FNR{a[$0]++;next}{if (!($0 in a))print}' db.txt country.txt 
United Arab Emirates
United Kingdom

msathees · November 22, 2012, 1:47am

try this...

grep -vxFf all-countries.txt countries-in-db.txt

gives the records from db.txt which are not present in all_countries.txt

beanbaby · November 22, 2012, 1:50am

sweet, looks like it is getting the results for countries with two spaces like

United States but not getting single countries like Thailand is in both .txt files but still outputs in the results?

Thanks

elixir_sinari · November 22, 2012, 2:01am

awk 'FNR==NR{sub(/[ \t]*$/,"");a[$0];next}
{sub(/[ \t]*$/,"")}
!($0 in a)' db.txt allcountries.txt

beanbaby · November 22, 2012, 2:06am

Thanks seems to have worked perfectly.

Many Thanks!

RudiC · November 22, 2012, 4:20am

Try

$ grep -vf  db.txt allcountries.txt
United Arab Emirates
United Kingdom

If spaces lead to inconsistencies, try tr -d " " <file|grep -vf <(tr -d " " <file2)