I have a large text file with multiple similar patterns on each line like:
blank">PATTERN1 some word PATTERN2
title=">PATTERN1 some word PATTERN2
blank">PATTERN1 another word PATTERN2
title=">PATTERN1 another word PATTERN2
blank">PATTERN1 one more time PATTERN2
title=">PATTERN1 one more time PATTERN2
I would like to convert all occurrences between title=">PATTERN1 and PATTERN2 to Title Case format.
Desired output:
blank">PATTERN1 some word PATTERN2
title=">PATTERN1 Some Word PATTERN2
blank">PATTERN1 another word PATTERN2
title=">PATTERN1 Another Word PATTERN2
blank">PATTERN1 one more time PATTERN2
title=">PATTERN1 One More Time PATTERN2
Any idea on accomplishing this is much appreciated. Thanks very much for your help!
blank">PATTERN1 some word PATTERN2
title=">PATTERN1 some word PATTERN2
blank">PATTERN1 another word PATTERN2
title=">PATTERN1 another word PATTERN2 and leave this alone PATTERN2
blank">PATTERN1 one more time PATTERN2
title=">PATTERN1 one more time but not last PATTERN2
Please, give it a try:
perl -ple '/title=">PATTERN1\s+(.*?)\s+PATTERN2/ and $s=$1 and s|$s|join " ", (map {ucfirst} split /\s/, $s)|e' patterns.file
blank">PATTERN1 some word PATTERN2
title=">PATTERN1 Some Word PATTERN2
blank">PATTERN1 another word PATTERN2
title=">PATTERN1 Another Word PATTERN2 and leave this alone PATTERN2
blank">PATTERN1 one more time PATTERN2
title=">PATTERN1 One More Time But Not Last PATTERN2
Following awk solution may help you too in same, let me know if this helps.
awk '/^title=\">PATTERN1.*PATTERN2$/{A=$1;for(i=2;i<NF;i++){A=A OFS toupper(substr($i,1,1)) substr($i,2)};print A OFS $NF;A="";next} {print}' Input_file
Output will be as follows.
blank">PATTERN1 some word PATTERN2
title=">PATTERN1 Some Word PATTERN2
blank">PATTERN1 another word PATTERN2
title=">PATTERN1 Another Word PATTERN2
blank">PATTERN1 one more time PATTERN2
title=">PATTERN1 One More Time PATTERN2
On a Solaris/SunOS system, change awk to /usr/xpg4/bin/awk , /usr/xpg6/bin/awk , or nawk .
Both solutions work great but i ran into a small issue. It seems to replace only the first occurrence and ignores the others.
I tried the following sample code with tokyo city, london and new york which i needed in Title Case.
title=">PATTERN1 tokyo city PATTERN2 whatever whatever title=">PATTERN1 london PATTERN2 whatever whatever title=">PATTERN1 new york PATTERN2 whatever whatever ....
I only got Tokyo City in Title case
title=">PATTERN1 Tokyo City PATTERN2 whatever whatever title=">PATTERN1 london PATTERN2 whatever whatever title=">PATTERN1 new york PATTERN2 whatever whatever ....
I was hoping to get every single occurrence between those patterns in Title case like: Tokyo City, London and New York
title=">PATTERN1 Tokyo City PATTERN2 whatever whatever title=">PATTERN1 London PATTERN2 whatever whatever title=">PATTERN1 New York PATTERN2 whatever whatever ....
And, in those cases, what is supposed to happen? Does capitalization start with the first string and continue until the second string is seen on a subsequent line, or is capitalization only performed between an occurrence of the first string up to the next occurrence of the second string that appears on the same line.
And, on a line like:
title=">PATTERN1 tokyo city whatever whatever title=">PATTERN1 london PATTERN2
is the output supposed to be:
title=">PATTERN1 Tokyo City Whatever Whatever Title=">PATTERN1 London PATTERN2
or:
title=">PATTERN1 tokyo city whatever whatever title=">PATTERN1 London PATTERN2
That's the result i needed. ( with capitalization only performed between an occurrence of the first string up to the next occurrence ) Thank you Don!.
The perl solution from Aia solved the issue.
PATTERN2 whatever goes to tokyo city ....title=">PATTERN1 tokyo city PATTERN2 whatever whatever whatever title=">PATTERN1 london PATTERN2 whatever whatever whatever title=">PATTERN1 new york PATTERN2 whatever whatever whatever ....
PATTERN2 whatever goes to tokyo city ....title=">PATTERN1 Tokyo City PATTERN2 whatever whatever whatever title=">PATTERN1 London PATTERN2 whatever whatever whatever title=">PATTERN1 New York PATTERN2 whatever whatever whatever
title=">PATTERN1 tokyo city whatever whatever title=">PATTERN1 london PATTERN2
you want the output:
title=">PATTERN1 tokyo city whatever whatever title=">PATTERN1 London PATTERN2
and you said that Aia's perl script does exactly what you want. But both of Aia's perl scripts produce the following output from that input:
title=">PATTERN1 Tokyo City Whatever Whatever Title=">PATTERN1 London PATTERN2
I'm not nearly as good at writing perl scripts as Aia is, but the following awk script seems to do what you have requested for every output you have specified with various sample inputs:
awk '
BEGIN { OFF = "PATTERN2"
ON = "title=\">PATTERN1"
}
{ n = on = 0
for(i = 1; i <= NF; i++)
if($i == ON) {
if(!on) n++
on = 1
start[n] = i + 1
} else if($i == OFF) {
if(on) {
on = 0
stop[n] = i - 1
}
}
if(on) n--
for(i = 1; i <= n; i++)
for(j = start; j <= stop; j++)
$j = toupper(substr($j, 1, 1)) substr($j, 2)
}
1' file
As always, if you want to try this on a Solaris/SunOS system, change awk to /usr/xpg4/bin/awk or nawk . I'm guessing this still doesn't really do what you want, but you still haven't given us a complete specification for what happens when the three patterns you showed us in your original post in this thread are not paired as shown in the examples in that post.