Hello,
I am a total linux newbie and I can't seem to find a solution to this little problem.
I have two text files with a huge list of URLS. Let's call them file1.txt and file2.txt
What I want to do is grab an URL from file2.txt, search file1.txt for the URL and if found, delete it from file1.txt.
What would be the best way to go around doing this? Would I need a full bash script or does anyone know a simple oneliner to do it.
for i in `cat file1`; do echo $i|grep -v -f file2; done
that will output only the lines in file1 which are not found in file2. Then if you want you can redirect the output to another file so you can create a new file1
for i in `cat file1`; do echo $i|grep -v -f file2; done > file1.new
ya sorry i typed double quote instead of single quote
usually in sed "" is used to expan the variable... when you are using variables inside sed you should be extra carefull
vidyadhar85 , that didn't seem to work. I dont know what might be wrong, but it's not finding any common lines.
File1.txt has 1687 lines, of which 472 are in File2.txt
I am trying redoubtable's solution but it's a bit slow. It's been working for a bit over 10min which is understandable since the files are big.
1) Load all the urls from file1.txt to a hashmap
2) Parse file2.txt each line one by one
3) if entry in file2.txt found in file1.txt delete the hashmap entry ( constructed for file1.txt )
4) loop through file2.txt till EOF
5) Write all the contents of remaining hash map entry to another file - this should give the output that you intend for
Just a question: what is wrong with merging the files first and use "sort -u" then? I suppose you want to add the content of one file to the other but don't want to create duplicates, right? If so: