awk and gsub - how to replace only the first X occurrences

bingel · September 24, 2010, 11:35am

I have a text (text.txt) and I would like to replace only the first 2 occurrences of a word (but I might need to replace more):

For example, if text is this:

CAR sweet head
hat red yellow
CAR book brown
tiger CAR cow CAR
CAR milk

I would like to replace the word "CAR" with word "REPLACE" only in row 1 and 3 but not in row 4 and 5 and I would like to obtain a result like this:

REPLACE sweet head
hat red yellow
REPLACE book brown
tiger CAR cow CAR
CAR milk

but if I use:

cat text.txt | awk '{gsub("CAR","REPLACE");print}'

I will obtain this:

REPLACE sweet head
hat red yellow
REPLACE book brown
tiger REPLACE cow REPLACE
REPLACE milk

Is there a way to obtain what i need (if possible using awk and gsub)?

Thanks in advance.

kurumi · September 24, 2010, 11:39am

$ ruby -00 -ne 'print $_.split("CAR",3).join("REPLACE")' file
REPLACE sweet head
hat red yellow
REPLACE book brown
tiger CAR cow CAR
CAR milk

quirkasaurus · September 24, 2010, 11:40am


cat text.txt |
awk '/CAR/{
  if ( count < 2 ){ gsub("CAR","REPLACE")}
  count++
  print
  }'

...oh... and you can pass in the 2 as a variable:


cat text.txt |
awk '/CAR/{
  if ( count < counter ){ gsub("CAR","REPLACE")}
  count++
  print
  }' counter=$counter

bingel · September 24, 2010, 11:59am

Thanks, but using awk and gsub?

I don't know ruby and I would like to use awk because so I could adapt the script to my needs.

Thanks again

---------- Post updated at 04:44 PM ---------- Previous update was at 04:43 PM ----------

Thanks again, I had not seen the last message

---------- Post updated at 04:55 PM ---------- Previous update was at 04:44 PM ----------

@quirkasaurus

I have tested your first code but I obtain this output:

REPLACE sweet head
REPLACE book brown
tiger CAR cow CAR
CAR milk

The second row is deleted

---------- Post updated at 04:59 PM ---------- Previous update was at 04:55 PM ----------

With this code it runs:

cat text.txt | awk '{if ( count < 3 ){ gsub("CAR","REPLACE")} count++; print}'

quirkasaurus · September 24, 2010, 12:01pm

oops. sorry. didn't notice that.

cat text.txt |
awk '{
  if ( count < 2 ){
    gsub("CAR","REPLACE")
    count++
    }
  print
  }'

output:
REPLACE sweet head
hat red yellow
CAR book brown
tiger CAR cow CAR
CAR milk

bingel · September 24, 2010, 12:05pm

I need a code like this because, since I have a big file where to search, I would like to stop search at first 2 occurrences found but with gsub I think search is done until the end of file, instead with this code (using sub in place of gsub):

cat text.txt | awk '{sub("CAR","REPLACE");print}'

search is done until awk find the occurence. Is it right?

pravin27 · September 24, 2010, 12:12pm

quirkasaurus,
your code is not replacing 2nd occurance.

slightly change in your code

awk '/CAR/ && count < 2  {gsub("CAR","REPLACE");count++} {print $0}' infile

bingel · September 24, 2010, 12:17pm

As I said, the file is very large, so I would stop the search soon after having found the first X occurrences. So, do you think that if I use the sub function instead of gsub, search will be faster? :

cat text.txt |
awk '{
  if ( count < 2 ){
    sub("CAR","REPLACE")
    count++
    }
  print
  }'

---------- Post updated at 05:17 PM ---------- Previous update was at 05:13 PM ----------

What do you think about this?

cat text.txt | awk '{if ( count < 3 ){ sub("CAR","REPLACE")} count++; print}'

Scott · September 24, 2010, 12:17pm

Hi.

sub replaces the first occurrence on a line, gsub replaces all occurrence on a line - not all occurrences in a file.

awk '/CAR/ {
  if ( count++ < 2 )
    gsub("CAR","REPLACE")
  }1' file

pravin27 · September 24, 2010, 12:20pm

if you want to exit from the code after X occurrence , you will get result till that record.

 awk '/CAR/ && count < 2  {gsub("CAR","REPLACE");count++} {print $0;if(count == 2) {exit}}'  infile

O/P

REPLACE sweet head
hat red yellow
REPLACE book brown

Franklin52 · September 24, 2010, 1:05pm

I havn't test the ruby version but all the solution above fail if you want to replace 2 occurence with a file like this:

CAR sweet CAR woman CAR
CAR red yellow
CAR book brown
tiger CAR cow CAR
CAR milk

Try something like this:

awk '{while(index($0,"CAR") && ++n < 3){sub("CAR","REPLACE")}}1' file

kurumi · September 24, 2010, 8:04pm

Yes, it does produce the correct output, since its splitting on a limit. However, need to change regex to \bCAR\b if boundary is required. another awk way assuming CAR is not bounded

 
$ awk 'BEGIN{RS="CAR";ORS="REPLACE"}NR>2{ORS="CAR"}1' file
REPLACE sweet REPLACE woman CAR
CAR red yellow
CAR book brown
tiger CAR cow CAR
CAR milk

shamrock · September 24, 2010, 8:46pm

Yet another way of doing the same...

awk '{
  if (cnt < 2)
    for (i=1;i<=NF;i++)
      if ($i=="CAR" && cnt<2)
        cnt += sub("CAR","REPLACE",$i)
   print
}' file