how to fgrep -f two files and get only one instance of each matched line

Hello,

I have two files:

 
file1
x
v
r
g
 
file2 
 
aaaa,x,1111
bbbb,v,2222
bbbb,v,
cccc,r,3333
dddd,s,4444
eeee,q,5555
ffff,p,6666
 
output
 
aaaa,x,1111
bbbb,v,2222
cccc,r,3333
 
and not
 
aaaa,x,1111
bbbb,v,2222
bbbb,v,
cccc,r,3333

fgrep -f file1 file2 gives me what I dont want

Thanks,

at the cost of performance:

while read searchstring ; do grep -m 1 ${searchstring} file2 ; done < file1
1 Like
awk -F , 'NR==FNR{a[$1];next} $2 in a{print;delete a[$2]}' file1 file2
1 Like

Hi rdcwayx,
I think your solution is what I need however my lines are a bit more complicated than the example I gave and your solution is based on that which is my fault. But I was looking for something general and it seems like there is not something you can apply to everycase.

the expressions I want is more like this:

 
Network=XXX,Context=GG123,Element=1

and I want what ever comes after the second equal sign before the last comma, in this case would be GG123.

Hi funksen,

I have thousands of lines not sure if while would be a good idea as you mentioned.

Try with this ..

$ fgrep -f file1 file2 | nawk -F'[=,]' '!x[$4]++'

Hi jayan jay,

this is the error I get when running your solutions:

 
user> fgrep -f not_upgraded_sites.txt not_upgraded.sel | nawk -F'[=,]' '!x[$4]++'
x[: Event not found. 

Pls try with double quotes ..

I get the same thing

---------- Post updated at 04:24 AM ---------- Previous update was at 04:18 AM ----------

if anyone is interested,
a combination of jayan jay's solution and rdcwayx's solutions works:

 
nawk -F"[=,]" 'NR==FNR{a[$1];next} $4 in a{print;delete a[$4]}' file1 file2

deleted the content ..

hey jayan jay can you explain to me why field $4?

---------- Post updated at 04:31 AM ---------- Previous update was at 04:25 AM ----------

jayan jay,

all possible combinations:

 
> fgrep -f not_upgraded_sites.txt not_upgraded.sel  | nawk -F"[=,]" "!x[$4]++"
x[: Event not found.
> fgrep -f not_upgraded_sites.txt not_upgraded.sel  | nawk -F'[=,]' "!x[$4]++"
x[: Event not found.
> fgrep -f not_upgraded_sites.txt not_upgraded.sel | nawk -F'[=,]' '!x[$4]++'
x[: Event not found.
> 

Hope it clears .. :slight_smile:

$ echo "SubNetwork=XXX,MeContext=GG123,Element=1" | nawk -F'[=,]' '{print $4}'
GG123

Try this combination also .. :wink:

$ fgrep -f not_upgraded_sites.txt not_upgraded.sel | nawk -F"[=,]" '!x[$4]++'
1 Like
 
file 1
FC1
FC3
FC4
FC2
FC5

actual file2:

 
Network=ONR,Context=FC6,Element=1
Network=ONR,Context=FC7,Element=1
Network=ONR,Context=FC0,Element=1
Network=ONR,Context=FC1
Network=ONR,Context=FC1,Element=1
Network=ONR,Context=FC2,Element=1
Network=ONR,Context=FC0,Element=1
Network=ONR,Context=FC6,Element=1

Ok, my understand, your file1 will be something like:

GG123
GG124
XX123

file2

Network=XXX,Context=GG123,Element=1
Network=XXX,Context=GG123,Element=2
Network=XXX,Context=XX123,Element=1
Network=XXX,Context=GG124,Element=1
Network=XXX,Context=GG123,Element=1
awk -F , 'NR==FNR{a[$1];next} {split($2,b,"=");if (b[2] in a){print;delete a[b[2]]}}' file1 file2