I have a large file (file1) that has 96770154 lines and a smaller file (file2) that has 3 lines. I want to remove all of the strings from file1 that occur in file2.
file1 looks like this:
DOGDOGNODOGTESTCAT
CATHELLOBYEEBYEFAT
CATCATDOGDOGCATYESGOOD
file2 looks like this:
YES
NO
GOOD
The output should look like this:
DOGDOGNODOGTESTCAT
CATCATDOGDOGCATYESGOOD
What I have so far is this but the output is not 2 lines. It is instead 6 lines for whatever reason that I'm not understanding:
while read A B; do grep -v $A file1; done < file2 > out
Try the below
awk '{if(NR == FNR) {a[$0]} else {for (x in a) {if($0 ~ x) {print $0; next}}}}' file2 file1
1 Like
The while loop will call grep three times with a grep -v YES
a grep -v NO
and a grep -v GOOD
and that output is concatenated.
Alternatively you could try without a while loop with:
grep -vf file2 file1
Note:
DOGDOGNODOGTESTCAT
will be left out since it has the string "NO" in it..
1 Like
any way to output each of the output lines into their own separate files?
You mean every line of the output is a separate file?
grep -vf file2 file1 | split -l 1
all those match YES will go to output_YES, NO will go to output_NO...
but, if a string has both YES & NO, the below code will only send the line to output_YES.
awk '{if(NR == FNR) {a[$0]} else {for (x in a) {if($0 ~ x) {print $0 > "output_" x; next}}}}' file2 file1
If you want the string that contains both YES and NO to be sent to both the output file, use the below
awk '{if(NR == FNR) {a[$0]} else {for (x in a) {if($0 ~ x) {print $0 > "output_" x}}}}' file2 file1
This isn't quite what I mean. In general if I have a while loop
while read A B ; do
something
something else file1
done; < file2 > out
How can I output a file each time the while loop goes through 1 cycle. The length of the output can vary so I can't use
split -l 1
.
Is something like this what your after:
while read A B
do
( something
something else file1 ) > out_$A
done < file2
1 Like
so I am trying to do something like this but what goes into say 1.out should not be the same output that goes into 2.out. Each out file should be the output of a single line from the while loop (but that output can vary in length).
for ((i=1;i<=3;i++));
do
while read A B
do
awk '{if(NR == FNR) {a[$0]} else {for (x in a) {if($0 ~ x) {print $0; next}}}}' file2 file1 > $i.out
done
done
You want something more like this then:
while read A B
do
((i++))
awk '{if(NR == FNR) {a[$0]} else {for (x in a) {if($0 ~ x) {print $0; next}}}}' file2 file1 > $i.out
done
1 Like
The general solution in #8 should be useable, but it would be better to use code grouping {}
instead of a new subshell ()
for every iteration..
for/while something
do
{
code segment
} > file$((i+=1))
done
2 Likes