Replacing characters in a file

Suppose I have a file which has 1000 columns (5 SHOWN FOR EXAMPLE)
two alphabets are separated by a space and then tab

A A"\t"C C"\t"G G"\t"0 0"\t"T T
A G"\t"C C"\t"G G"\t"A T"\t"0 0
G A"\t"0 0"\t"G C"\t"A A"\t"T C

whenever there is a 0 0 in any column, the output should be printed as

A A"\t"0 0"\t"G G"\t"0 0"\t"0 0
A G"\t"0 0"\t"G G"\t"0 0"\t"0 0
G A"\t"0 0"\t"G C"\t"0 0"\t"0 0

Can someone tell be how to solve this problem for 1000 columns.

Thank You in advance

Hi, try:

awk '$0~s{$2=s; for(i=4; i<=NF; i++) $i=s}1' s="0 0" FS='\t' OFS='\t' file
1 Like

Not clear. What are the columns? Are those columns space separated or TAB separated? Your sample shows five, not four columns. Will Col three always be retained? What about Col 1000?

Please specify more carefully in English.

Hi

Sorry, the example has 5 columns.
Lets's take the 1st row
There is a space between A A, tab 0 0, tab G G, tab 0 0, tab T T.
Column three won't always be retained.

---------- Post updated at 06:58 PM ---------- Previous update was at 06:49 PM ----------

Hi,

Thank you so much, it tested it on the above example and it works.
Can you explain the code a bit. Like why is i=4, $2=s.

Thanks once again

Hi, sure:

awk '
  $0~s {                         # if the line contains the search string in variable s
    $2=s                         # set the second field to that string 
    for(i=4; i<=NF; i++) $i=s    # and set field 4 and higher to that string (thus leaving only field 1 and 3 unchanged)
  }
  1                              # print the line
' s="0 0" FS='\t' OFS='\t' file  # set s (the search string variable) to "0 0" and 
                                 # and set both input and output field separators to TAB
1 Like

Hi

I understand now from the comments what i=4 does in this example.

But what if I don't know the exact positions of the columns which has 0 0 in the 1000 column file. Let's suppose an example, column 4 doesn't have 0 0, column 5 has 0 0, no 0 0 in column 6 and 7, 0 0 in column 8 and so on.

What needs to be done in this case?
Hope that I was able to explain the situation a bit.

Thanks

Let me guess: Is the requirement that if a column has "0 0" in any row, the entire column should become "0 0"? That would be an entirely different approach. You'd need to read the entire file to decide if any column goes zero and then print out all rows/columns.

Hi Rudy,

Yes, that's the thing what I want to achieve in the task.

Yes that is a totally different thing. Eleborating on RudiC's suggestion, try:

awk '{for(i=1; i<=NF; i++) if(NR==FNR) {if($i==s) A} else if(i in A) $i=s}NR>FNR' s="0 0" FS='\t' OFS='\t' file file

The input file is specified twice..

--edit--
This is perhaps easier to read / understand :

awk 'NR==FNR{for(i=1; i<=NF; i++) if($i==s) A; next} {for(i=1; i<=NF; i++) if(i in A) $i=s}1' s="0 0" FS='\t' OFS='\t' file file
1 Like

Am I missing something here or are the TABS inside the inverted commas NOT the characters?

Copied and pasted into the shell:-

Last login: Tue Jul 15 21:01:35 on ttys000
AMIGA:barrywalker~> printf 'A A"\t"C C"\t"G G"\t"0 0"\t"T T
> A G"\t"C C"\t"G G"\t"A T"\t"0 0
> G A"\t"0 0"\t"G C"\t"A A"\t"T C
> '
A A"	"C C"	"G G"	"0 0"	"T T
A G"	"C C"	"G G"	"A T"	"0 0
G A"	"0 0"	"G C"	"A A"	"T C
AMIGA:barrywalker~> _

I interpreted it to mean that "\t" represents a TAB character..

You are right Scrutinizer.
I am bad in posting the file format as I always get confused how to do it properly here. Next time, I will do it more carefully.

But the last script from you works perfectly fine. It's wonderful.

Thank you so much to you and thanks everyone for the suggestions as well.

As long as you use code tags ( see moderator comment in post #1), you should be fine...

You're welcome :slight_smile: