Grep for string and substitute if exits with a yes/no

Hi

I have 3 files in total. file 1 is enriched.txt file2 is repressed.txt and file 3 is my content.txt

What i need is query the content file against both enriched and repressed and wherever the gensymbol is same in both the files then add a yes value against it

file1

Gene
ABC
XYZ
MNO

file2

Gene
PQR
DEF
GHI

file3

Gene exp1 exp2 exp3 exp4 exp5 enriched repressed
ABC 2 3 4 5 7 yes no 
MNO 5.5 6.8 7.6 5.5 4.2 yes no
DEF 1 4 6 7 9 no yes
GHI 3 5 6 7 8 0 no  yes
LMN 2.2 3.4 5.7 8.9 7.2 no no
TQR 4 5 6 8 8 no no

How can i add a new column to the existing file (file3) and add yes/no against each row in awk?

Thanks,

awk 'NR == FNR && FNR > 1{a[$0]; n=FNR; next}
  NR == FNR + n && FNR > 1{b[$0]; next}
  FNR == 1 {$9 = "colname"; print $0; next}
  {if($1 in a && $1 in b) {$9 = "yes"} else {$9 = "no"};
  print $0}' enriched.txt repressed.txt content.txt

With more than 50 questions posted, we hope that you have been learning from our suggestions. What code have you tried to solve this problem?

I tried it for a single file search and was not successful

awk 'NR==FNR{keys[$1]; next} {for (key in keys) if ($0 ~ key) {print FILENAME, $0; next} }' file1.txt file3.txt >output.txt

The above given code is print header from all the files. Below is corrected code

awk 'NR == FNR{if(FNR > 1) a[$0]; n=FNR; next}
  NR == FNR + n{if(FNR > 1) b[$0]; next}
  FNR == 1 {$9 = "colname"; print $0; next}
  {if($1 in a && $1 in b) {$9 = "yes"} else {$9 = "no"};
  print $0}' enriched.txt repressed.txt content.txt

awk -f di.awk enriched.txt repressed.txt content.txt
(the order matters)
where di.awk:

FILENAME == ARGV[1] {
   if(FNR==1) next
   enr[$1]
   next
}
FILENAME == ARGV[2] {
   if(FNR==1) next
   rep[$1]
   next
}
{
  if (FNR==1) {print $0, "enriched", "repressed";next}
  print $0, ($1 in enr)?"yes":"no", ($1 in rep)?"yes":"no"
}

Thanks for the email. When I apply it to real data I see only one column with no. Also all the values are no. I should be seeing 2 columns one for enriched and one for repressed where it says(yes no) or (no yes) or (no no). Also what is column 9. I have enriched and repressed in columns 7 and 8 respectively.

Thanks,

not sure who this statement is for.......

Maybe I'm confused about what you're trying to do. I assumed that your file3.txt on is something like:

Gene exp1 exp2 exp3 exp4 exp5
ABC 2 3 4 5 7
MNO 5.5 6.8 7.6 5.5 4.2
DEF 1 4 6 7 9
GHI 3 5 6 7 8 0 
LMN 2.2 3.4 5.7 8.9 7.2
TQR 4 5 6 8 8

and that the contents of file3 you showed us in the 1st message in this thread was the desired output. You could do that with something like:

awk '
FNR == 1 {
	if(++f > 2) print $0, "enriched", "repressed"
	next
}
f < 3 {	gene[f,$1]
	next
}
{	print $0, (1,$1) in gene ? "yes" : "no", (2,$1) in gene ? "yes" : "no"
}' file[1-3].txt

The test in your code:

 if ($0 ~ key)

would never match. For example, on the 1st data line in file3.txt when key is ABC , the test would expand to:

if ("ABC 2 3 4 5 7" ~ "ABC")

Perhaps you meant:

if ($1 == key)

But, I thought you were trying to add two columns of "yes" or "no" to the contents of file3.txt at the end of the lines. Instead you're adding the string "file3.txt" to the start of a line from file3.txt???

Please explain more clearly what file3.txt is on input and exactly what output you want to produce.

1 Like

Thanks vgersh99. It was for Srini.

Your code worked. but I need the enriched and repressed results column to be tab delimited rather than space.

awk -v OFS='\t' -f di.awk enriched.txt repressed.txt content.txt

I misunderstood your requirement.
And the email you received must be an automated from unix.com :slight_smile:

awk 'NR == FNR{if(FNR > 1) a[$0];
    n=NR;
    next}
  NR == FNR + n{if(FNR > 1) b[$0];
    next}
  FNR == 1{$7="enriched";
    $8="repressed";
    print $0;
    next}
  {$7 = $8 = "no";
  if($1 in a) $7 = "yes";
  if($1 in b) $8 = "yes";
  print $0}' OFS='\t' enriched.txt repressed.txt content.txt

thank you Srini and vgersh99. Both of your codes worked.