Grep for string and substitute if exits with a yes/no

Diya123 · April 30, 2014, 2:32pm

Hi

I have 3 files in total. file 1 is enriched.txt file2 is repressed.txt and file 3 is my content.txt

What i need is query the content file against both enriched and repressed and wherever the gensymbol is same in both the files then add a yes value against it

file1

Gene
ABC
XYZ
MNO

file2

Gene
PQR
DEF
GHI

file3

Gene exp1 exp2 exp3 exp4 exp5 enriched repressed
ABC 2 3 4 5 7 yes no 
MNO 5.5 6.8 7.6 5.5 4.2 yes no
DEF 1 4 6 7 9 no yes
GHI 3 5 6 7 8 0 no  yes
LMN 2.2 3.4 5.7 8.9 7.2 no no
TQR 4 5 6 8 8 no no

How can i add a new column to the existing file (file3) and add yes/no against each row in awk?

Thanks,

SriniShoo · April 30, 2014, 2:59pm

awk 'NR == FNR && FNR > 1{a[$0]; n=FNR; next}
  NR == FNR + n && FNR > 1{b[$0]; next}
  FNR == 1 {$9 = "colname"; print $0; next}
  {if($1 in a && $1 in b) {$9 = "yes"} else {$9 = "no"};
  print $0}' enriched.txt repressed.txt content.txt

Don_Cragun · April 30, 2014, 2:59pm

With more than 50 questions posted, we hope that you have been learning from our suggestions. What code have you tried to solve this problem?

Diya123 · April 30, 2014, 3:19pm

I tried it for a single file search and was not successful

awk 'NR==FNR{keys[$1]; next} {for (key in keys) if ($0 ~ key) {print FILENAME, $0; next} }' file1.txt file3.txt >output.txt

SriniShoo · April 30, 2014, 3:47pm

The above given code is print header from all the files. Below is corrected code

awk 'NR == FNR{if(FNR > 1) a[$0]; n=FNR; next}
  NR == FNR + n{if(FNR > 1) b[$0]; next}
  FNR == 1 {$9 = "colname"; print $0; next}
  {if($1 in a && $1 in b) {$9 = "yes"} else {$9 = "no"};
  print $0}' enriched.txt repressed.txt content.txt

vgersh99 · April 30, 2014, 3:57pm

awk -f di.awk enriched.txt repressed.txt content.txt
(the order matters)
where di.awk:

FILENAME == ARGV[1] {
   if(FNR==1) next
   enr[$1]
   next
}
FILENAME == ARGV[2] {
   if(FNR==1) next
   rep[$1]
   next
}
{
  if (FNR==1) {print $0, "enriched", "repressed";next}
  print $0, ($1 in enr)?"yes":"no", ($1 in rep)?"yes":"no"
}

Diya123 · April 30, 2014, 3:58pm

Thanks for the email. When I apply it to real data I see only one column with no. Also all the values are no. I should be seeing 2 columns one for enriched and one for repressed where it says(yes no) or (no yes) or (no no). Also what is column 9. I have enriched and repressed in columns 7 and 8 respectively.

Thanks,

vgersh99 · April 30, 2014, 4:01pm

not sure who this statement is for.......

Don_Cragun · April 30, 2014, 4:06pm

Maybe I'm confused about what you're trying to do. I assumed that your file3.txt on is something like:

Gene exp1 exp2 exp3 exp4 exp5
ABC 2 3 4 5 7
MNO 5.5 6.8 7.6 5.5 4.2
DEF 1 4 6 7 9
GHI 3 5 6 7 8 0 
LMN 2.2 3.4 5.7 8.9 7.2
TQR 4 5 6 8 8

and that the contents of file3 you showed us in the 1st message in this thread was the desired output. You could do that with something like:

awk '
FNR == 1 {
	if(++f > 2) print $0, "enriched", "repressed"
	next
}
f < 3 {	gene[f,$1]
	next
}
{	print $0, (1,$1) in gene ? "yes" : "no", (2,$1) in gene ? "yes" : "no"
}' file[1-3].txt

The test in your code:

 if ($0 ~ key)

would never match. For example, on the 1st data line in file3.txt when key is ABC , the test would expand to:

if ("ABC 2 3 4 5 7" ~ "ABC")

Perhaps you meant:

if ($1 == key)

But, I thought you were trying to add two columns of "yes" or "no" to the contents of file3.txt at the end of the lines. Instead you're adding the string "file3.txt" to the start of a line from file3.txt???

Please explain more clearly what file3.txt is on input and exactly what output you want to produce.

Diya123 · April 30, 2014, 4:06pm

Thanks vgersh99. It was for Srini.

Your code worked. but I need the enriched and repressed results column to be tab delimited rather than space.

vgersh99 · April 30, 2014, 4:10pm

awk -v OFS='\t' -f di.awk enriched.txt repressed.txt content.txt

SriniShoo · April 30, 2014, 4:28pm

I misunderstood your requirement.
And the email you received must be an automated from unix.com

awk 'NR == FNR{if(FNR > 1) a[$0];
    n=NR;
    next}
  NR == FNR + n{if(FNR > 1) b[$0];
    next}
  FNR == 1{$7="enriched";
    $8="repressed";
    print $0;
    next}
  {$7 = $8 = "no";
  if($1 in a) $7 = "yes";
  if($1 in b) $8 = "yes";
  print $0}' OFS='\t' enriched.txt repressed.txt content.txt

Diya123 · April 30, 2014, 4:48pm

thank you Srini and vgersh99. Both of your codes worked.