Substitute first occurrence of keyword if occurrence between two other keywords

Assume a string that contains one or multiple occurrences of three different keywords (abbreviated as "kw"). I would like to replace kw2 with some other string, say "qux". Specifically, I would like to replace that occurrence of kw2 that is the first one that is preceded by kw1 somewhere in the string (i.e., kw1 not necessarily adjacent to kw2) and followed by kw3 (i.e., kw2 not necessarily adjacent to kw3).

Examples:

> echo "foo kw1 bar kw2 baz kw3 kw2 baz kw3" | sed ...
# Desired output: foo kw1 bar qux baz kw3 kw2 baz kw3
> echo "foo kw2 bar kw1 bar kw2 baz kw3" | sed ...
# Desired output: foo kw2 bar kw1 bar qux baz kw3

Got Perl?

cat m_gruenstaeudl.examples
foo kw1 bar kw2 baz kw3 kw2 baz kw3
foo kw2 bar kw1 bar kw2 baz kw3
foo kw2 bar kw1 kw2 baz kw3
foo kw2 bar kw1 kw2 kw3
perl -ple 's/(kw1.*?)kw2(.*?kw3)/$1qux$2/' m_gruenstaeudl.examples
foo kw1 bar qux baz kw3 kw2 baz kw3
foo kw2 bar kw1 bar qux baz kw3
foo kw2 bar kw1 qux baz kw3
foo kw2 bar kw1 qux kw3
1 Like

Or take the scenic route with awk... :wink:

$ 
$ # Show the data file
$ cat f33
foo kw1 bar kw2 baz kw3 kw2 baz kw3
foo kw2 bar kw1 bar kw2 baz kw3
foo kw2 bar kw1 kw2 baz kw3
foo kw2 bar kw1 kw2 kw3
foo kw1 bar kw1 kw2 kw3 kw1 kw2 kw3
kw1 kw2 kw3 kw1 kw2 kw3
kw1 kw2 kw3
kw1 kw2 kw4
kw1 kw2 kw4 foo bar buzz
foo kw1 kw2 kw2 kw2 kw3 bar
$ 
$ # Show the awk script
$ cat -n f33.awk
     1	BEGIN {TIMES = 0}
     2	{
     3	     for (i=1; i<=NF; i++) {
     4	         if ($i == "kw1" && TIMES == 0) {
     5	             # kw1 found: set IN, print it
     6	             printf("%s ", $i)
     7	             IN = 1
     8	             j = 0
     9	             TIMES++
    10	         } else if (IN == 1) {
    11	             # if we are here, TIMES=1 always, and we'll be here only once per "kw1..." pattern per line
    12	             if ($i == "kw2") {
    13	                 # kw2 found: add to partial array, do not print
    14	                 j++
    15	                 partial[j] = $i
    16	             } else if ($i == "kw3") {
    17	                 # kw3 found: reset 1st element, print array, reset IN and TIMES
    18	                 partial[1] = "qux"
    19	                 for (k=1; k<=j; k++) {
    20	                     printf("%s ",partial[k])
    21	                     delete partial[k]
    22	                 }
    23	                 printf("%s ", $i)
    24	                 TIMES++
    25	                 IN = 0
    26	             } else if (j > 0) {
    27	                 # j > 0: partial array has been initialized; add to partial array, do not print
    28	                 j++
    29	                 partial[j] = $i
    30	             } else {
    31	                 # j == 0 and field is something other than kw2 and kw3, print it
    32	                 printf("%s ", $i)
    33	             }
    34	         } else {
    35	             # Either TIMES=0 or > 1: print it"
    36	             printf("%s ", $i)
    37	         }
    38	     }
    39	     # if partial array was initialized, because kw2 was found, but kw3 was never found,
    40	     # then print the contents of the partial array and flush it
    41	     if (length(partial) > 0) {
    42	         for (k=1; k<=j; k++) {
    43	             printf("%s ",partial[k])
    44	             delete partial[k]
    45	         }
    46	     }
    47	     # we are done with this line; reset variables and repeat
    48	     printf("\n")
    49	     IN = 0
    50	     TIMES = 0
    51	}
    52	
$ 
$ # Run the awk script
$ awk -f f33.awk f33
foo kw1 bar qux baz kw3 kw2 baz kw3 
foo kw2 bar kw1 bar qux baz kw3 
foo kw2 bar kw1 qux baz kw3 
foo kw2 bar kw1 qux kw3 
foo kw1 bar kw1 qux kw3 kw1 kw2 kw3 
kw1 qux kw3 kw1 kw2 kw3 
kw1 qux kw3 
kw1 kw2 kw4 
kw1 kw2 kw4 foo bar buzz 
foo kw1 qux kw2 kw2 kw3 bar 
$ 
$ 

Hi M Gruenstaeudl,
Your specification is a little bit ambiguous. If all three keywords are found in order in an input line more than once (as in the line:

kw1 kw2 kw3 kw1 kw2 kw3

in durden tyler's sample input), do you just want kw2 to be replaced in the first set of 3 keywords on the line as in the output Aia's perl script and durden tyler's awk script produce:

kw1 qux kw3 kw1 kw2 kw3

or did you want the 1st occurrence of kw2 to be replaced in each set of 3 keywords:

kw1 qux kw3 kw1 qux kw3

?

@Aia: Yes, perl seems to be the way to go here, not sed. Thanks for your answer. It answered my question to the point.

perl -pi -le 's/(kw1.*?)kw2(.*?kw3)/$1qux$2/' infile