Subtracting two files

beginner_99 · August 11, 2016, 10:09pm

Hi,

I want to subtract 2 files and save the remaining text in another file. Lets say,

Hello
Happy  //
Hi
*
Hungry

File2

Happy
Hi

Output

Hello
 //
*
Hungry

I do not want to remove the whole line. Just need the exact subtraction. When I used grep -vf file2 file1 It removes whole line. How to get the exact subtraction? Thanks

Don_Cragun · August 11, 2016, 10:16pm

What operating system and shell are you using?

What have you tried to solve this problem on your own?

beginner_99 · August 11, 2016, 10:34pm

I'm using opensuse 13.2. Yeah I figured it out using,

grep -Fvf file2 file1

This removes the exact text needed to be subtracted. By the way can you help me to differentiate between pointers and multiplications of a C cod. I need to eliminate lines, containing pointers from a C code and save the rest to a text file using a shell script

itkamaraj · August 12, 2016, 3:50am

how about using sed ?

try with test data, before applying this command to the real data.

here, we are using sed -i

while read pattern; do sed -i "s/$pattern//g" file1 ; done < file2

RudiC · August 12, 2016, 4:19am

Try also

awk 'FNR == NR {T[$1]; next} {for (t in T) sub (t, _)} 1' file2 file1
Hello
  //

*
Hungry

Don_Cragun · August 12, 2016, 8:21pm

Hi beginner_99,
Note that the command:

grep -Fvf file2 file1

doesn't just remove the strings found in file2 from file1 ; it removes every line from file1 that contains any string found in file2 .

And, assuming that the strings contained in file2 only consist of characters that are not "special" in a regular expression (RE), the code suggested by itkamaraj will not only remove the strings Happy and Hi from file1 , it will also remove the string HHappyi (by removing Happy in the 1st invocation of sed and by removing the remaining Hi in the 2nd invocation of sed after removing Happy from HHappyi in the 1st invocation of sed ).

The awk script suggested by RudiC will not only remove the strings Happy and Hi , but will also either remove the string HHappyi as in itkamaraj's sed script OR the strings HHiappy , HaHippy , HapHipy , and HappHiy depending on which of the two possible random orders awk uses to process the two lines found in file2 (there are 2**(N-1) possible random orders to process the N lines in file2 for the more general case).

If the strings you are processing might contain characters that are "special" in a regex used by awk or sed , you need to use awk instead of sed and use match() instead of sub() to find the matching strings or preprocess the REs in file2 to escape "special" characters.

If you want to remove all remaining matching strings after removing strings on the first pass, you'll need to use awk or sed commands to delete matched strings in a loop until no more matches are found.

Knowing these options, can you tell us if any of these issue matter with the real data you will be processing? And, if so, do you need help in figuring out how to make the needed changes to the suggestions you have received so far?