Replacing characters in a file

rossi · July 15, 2014, 11:35am

Suppose I have a file which has 1000 columns (5 SHOWN FOR EXAMPLE)
two alphabets are separated by a space and then tab

A A"\t"C C"\t"G G"\t"0 0"\t"T T
A G"\t"C C"\t"G G"\t"A T"\t"0 0
G A"\t"0 0"\t"G C"\t"A A"\t"T C

whenever there is a 0 0 in any column, the output should be printed as

A A"\t"0 0"\t"G G"\t"0 0"\t"0 0
A G"\t"0 0"\t"G G"\t"0 0"\t"0 0
G A"\t"0 0"\t"G C"\t"0 0"\t"0 0

Can someone tell be how to solve this problem for 1000 columns.

Thank You in advance

Scrutinizer · July 15, 2014, 1:49pm

Hi, try:

awk '$0~s{$2=s; for(i=4; i<=NF; i++) $i=s}1' s="0 0" FS='\t' OFS='\t' file

RudiC · July 15, 2014, 1:53pm

Not clear. What are the columns? Are those columns space separated or TAB separated? Your sample shows five, not four columns. Will Col three always be retained? What about Col 1000?

Please specify more carefully in English.

rossi · July 15, 2014, 2:58pm

Hi

Sorry, the example has 5 columns.
Lets's take the 1st row
There is a space between A A, tab 0 0, tab G G, tab 0 0, tab T T.
Column three won't always be retained.

---------- Post updated at 06:58 PM ---------- Previous update was at 06:49 PM ----------

Hi,

Thank you so much, it tested it on the above example and it works.
Can you explain the code a bit. Like why is i=4, $2=s.

Thanks once again

Scrutinizer · July 15, 2014, 3:10pm

Hi, sure:

awk '
  $0~s {                         # if the line contains the search string in variable s
    $2=s                         # set the second field to that string 
    for(i=4; i<=NF; i++) $i=s    # and set field 4 and higher to that string (thus leaving only field 1 and 3 unchanged)
  }
  1                              # print the line
' s="0 0" FS='\t' OFS='\t' file  # set s (the search string variable) to "0 0" and 
                                 # and set both input and output field separators to TAB

rossi · July 15, 2014, 3:44pm

Hi

I understand now from the comments what i=4 does in this example.

But what if I don't know the exact positions of the columns which has 0 0 in the 1000 column file. Let's suppose an example, column 4 doesn't have 0 0, column 5 has 0 0, no 0 0 in column 6 and 7, 0 0 in column 8 and so on.

What needs to be done in this case?
Hope that I was able to explain the situation a bit.

Thanks

RudiC · July 15, 2014, 3:49pm

Let me guess: Is the requirement that if a column has "0 0" in any row, the entire column should become "0 0"? That would be an entirely different approach. You'd need to read the entire file to decide if any column goes zero and then print out all rows/columns.

rossi · July 15, 2014, 3:53pm

Hi Rudy,

Yes, that's the thing what I want to achieve in the task.

Scrutinizer · July 15, 2014, 3:58pm

Yes that is a totally different thing. Eleborating on RudiC's suggestion, try:

awk '{for(i=1; i<=NF; i++) if(NR==FNR) {if($i==s) A} else if(i in A) $i=s}NR>FNR' s="0 0" FS='\t' OFS='\t' file file

The input file is specified twice..

--edit--
This is perhaps easier to read / understand :

awk 'NR==FNR{for(i=1; i<=NF; i++) if($i==s) A; next} {for(i=1; i<=NF; i++) if(i in A) $i=s}1' s="0 0" FS='\t' OFS='\t' file file

wisecracker · July 15, 2014, 4:05pm

Am I missing something here or are the TABS inside the inverted commas NOT the characters?

Copied and pasted into the shell:-

Last login: Tue Jul 15 21:01:35 on ttys000
AMIGA:barrywalker~> printf 'A A"\t"C C"\t"G G"\t"0 0"\t"T T
> A G"\t"C C"\t"G G"\t"A T"\t"0 0
> G A"\t"0 0"\t"G C"\t"A A"\t"T C
> '
A A"	"C C"	"G G"	"0 0"	"T T
A G"	"C C"	"G G"	"A T"	"0 0
G A"	"0 0"	"G C"	"A A"	"T C
AMIGA:barrywalker~> _

Scrutinizer · July 15, 2014, 4:16pm

I interpreted it to mean that "\t" represents a TAB character..

rossi · July 15, 2014, 4:21pm

You are right Scrutinizer.
I am bad in posting the file format as I always get confused how to do it properly here. Next time, I will do it more carefully.

But the last script from you works perfectly fine. It's wonderful.

Thank you so much to you and thanks everyone for the suggestions as well.

Scrutinizer · July 15, 2014, 4:27pm

As long as you use code tags ( see moderator comment in post #1), you should be fine...

You're welcome