Convert text between exact matching patterns to Title case

Hi Folks,

I have a large text file with multiple similar patterns on each line like:

blank">PATTERN1 some word PATTERN2
title=">PATTERN1 some word PATTERN2
blank">PATTERN1 another word PATTERN2
title=">PATTERN1 another word PATTERN2
blank">PATTERN1 one more time PATTERN2
title=">PATTERN1 one more time PATTERN2

I would like to convert all occurrences between title=">PATTERN1 and PATTERN2 to Title Case format.

Desired output:

blank">PATTERN1 some word PATTERN2
title=">PATTERN1 Some Word PATTERN2
blank">PATTERN1 another word PATTERN2
title=">PATTERN1 Another Word PATTERN2
blank">PATTERN1 one more time PATTERN2
title=">PATTERN1 One More Time PATTERN2

Any idea on accomplishing this is much appreciated. Thanks very much for your help!

cat patterns.file
blank">PATTERN1 some word PATTERN2
title=">PATTERN1 some word PATTERN2
blank">PATTERN1 another word PATTERN2
title=">PATTERN1 another word PATTERN2 and leave this alone PATTERN2
blank">PATTERN1 one more time PATTERN2
title=">PATTERN1 one more time but not last PATTERN2

Please, give it a try:

perl -ple '/title=">PATTERN1\s+(.*?)\s+PATTERN2/ and $s=$1 and s|$s|join " ", (map {ucfirst} split /\s/, $s)|e' patterns.file
blank">PATTERN1 some word PATTERN2
title=">PATTERN1 Some Word PATTERN2
blank">PATTERN1 another word PATTERN2
title=">PATTERN1 Another Word PATTERN2 and leave this alone PATTERN2
blank">PATTERN1 one more time PATTERN2
title=">PATTERN1 One More Time But Not Last PATTERN2
1 Like

Hi Aia, It works perfectly as i was hoping for! Thank you very much for this solution. :slight_smile:

Hello martinsmith,

Following awk solution may help you too in same, let me know if this helps.

awk '/^title=\">PATTERN1.*PATTERN2$/{A=$1;for(i=2;i<NF;i++){A=A OFS toupper(substr($i,1,1)) substr($i,2)};print A OFS $NF;A="";next} {print}'  Input_file

Output will be as follows.

blank">PATTERN1 some word PATTERN2
title=">PATTERN1 Some Word PATTERN2
blank">PATTERN1 another word PATTERN2
title=">PATTERN1 Another Word PATTERN2
blank">PATTERN1 one more time PATTERN2
title=">PATTERN1 One More Time PATTERN2
 

On a Solaris/SunOS system, change awk to /usr/xpg4/bin/awk , /usr/xpg6/bin/awk , or nawk .

Thanks,
R. Singh

1 Like

Thank you Aia and R.Singh.

Both solutions work great but i ran into a small issue. It seems to replace only the first occurrence and ignores the others.

I tried the following sample code with tokyo city, london and new york which i needed in Title Case.

title=">PATTERN1 tokyo city PATTERN2 whatever whatever title=">PATTERN1 london PATTERN2 whatever whatever title=">PATTERN1 new york PATTERN2 whatever whatever ....

I only got Tokyo City in Title case

title=">PATTERN1 Tokyo City PATTERN2 whatever whatever title=">PATTERN1 london PATTERN2 whatever whatever title=">PATTERN1 new york PATTERN2 whatever whatever ....

I was hoping to get every single occurrence between those patterns in Title case like: Tokyo City, London and New York

title=">PATTERN1 Tokyo City PATTERN2 whatever whatever title=">PATTERN1 London PATTERN2 whatever whatever title=">PATTERN1 New York PATTERN2 whatever whatever ....

Any ideas ? Thank you for your time.

Will your input ever contain the string title=">PATTERN1 without a matching string PATTERN2 on the same line, such as one of the following?:

title=">PATTERN1 Tokyo City whatever whatever title=">PATTERN1 London PATTERN2 whatever whatever

or:

title=">PATTERN1 Tokyo City PATTERN2 whatever whatever title=">PATTERN1 London whatever whatever.
1 Like

Hi Don,

Yes it will, and also where sometimes

PATTERN2

appears before

title=">PATTERN1

Example:

PATTERN2 whatever whatever ..... title=">PATTERN1 tokyo city PATTERN2 whatever whatever title=">PATTERN1 London PATTERN2 whatever whatever .....title=">PATTERN1 whatever.......... 

And, in those cases, what is supposed to happen? Does capitalization start with the first string and continue until the second string is seen on a subsequent line, or is capitalization only performed between an occurrence of the first string up to the next occurrence of the second string that appears on the same line.

And, on a line like:

title=">PATTERN1 tokyo city whatever whatever title=">PATTERN1 london PATTERN2

is the output supposed to be:

title=">PATTERN1 Tokyo City Whatever Whatever Title=">PATTERN1 London PATTERN2

or:

title=">PATTERN1 tokyo city whatever whatever title=">PATTERN1 London PATTERN2
1 Like

Would any of these do it?

perl -ple '$cl=$_; while(/title=">PATTERN1\s+(.*?)\s+PATTERN2/g){$opt=$mpt=$&; $os=$ms=$1; $ms=~s/\b(\w)/\U$1/g; $mpt=~s/$os/$ms/; $cl=~s/$opt/$mpt/}$_=$cl' patterns.file
perl -ne '$cl=$_; while(/title=">PATTERN1\s+(.*?)\s+PATTERN2/g){$opt=$mpt=$&; $os=$ms=$1; $ms=~s/\b(\w)/\U$1/g; $mpt=~s/$os/$ms/; $cl=~s/$opt/$mpt/} print $cl' patterns.file
1 Like

That's the result i needed. ( with capitalization only performed between an occurrence of the first string up to the next occurrence ) Thank you Don!.
The perl solution from Aia solved the issue.

PATTERN2 whatever goes to tokyo city ....title=">PATTERN1 tokyo city PATTERN2 whatever whatever whatever title=">PATTERN1 london PATTERN2 whatever whatever whatever title=">PATTERN1 new york PATTERN2 whatever whatever whatever ....
perl -ple '$cl=$_; while(/title=">PATTERN1\s+(.*?)\s+PATTERN2/g){$opt=$mpt=$&; $os=$ms=$1; $ms=~s/\b(\w)/\U$1/g; $mpt=~s/$os/$ms/} $cl=~s/$opt/$mpt/};$_=$cl' patterns.file

Result:

PATTERN2 whatever goes to tokyo city ....title=">PATTERN1 Tokyo City PATTERN2 whatever whatever whatever title=">PATTERN1 London PATTERN2 whatever whatever whatever title=">PATTERN1 New York PATTERN2 whatever whatever whatever

Cheers Aia! :slight_smile:

Thanks a lot for everyones help.:b:

I'm confused.

You said that with the input:

title=">PATTERN1 tokyo city whatever whatever title=">PATTERN1 london PATTERN2

you want the output:

title=">PATTERN1 tokyo city whatever whatever title=">PATTERN1 London PATTERN2

and you said that Aia's perl script does exactly what you want. But both of Aia's perl scripts produce the following output from that input:

title=">PATTERN1 Tokyo City Whatever Whatever Title=">PATTERN1 London PATTERN2

I'm not nearly as good at writing perl scripts as Aia is, but the following awk script seems to do what you have requested for every output you have specified with various sample inputs:

awk '
BEGIN {	OFF = "PATTERN2"
	ON = "title=\">PATTERN1"
}
{	n = on = 0
	for(i = 1; i <= NF; i++)
		if($i == ON) {
			if(!on)	n++
			on = 1
			start[n] = i + 1
		} else if($i == OFF) {
			if(on) {
				on = 0
				stop[n] = i - 1
			}
		}
	if(on)	n--
	for(i = 1; i <= n; i++)
		for(j = start; j <= stop; j++)
			$j = toupper(substr($j, 1, 1)) substr($j, 2)
}
1' file

As always, if you want to try this on a Solaris/SunOS system, change awk to /usr/xpg4/bin/awk or nawk . I'm guessing this still doesn't really do what you want, but you still haven't given us a complete specification for what happens when the three patterns you showed us in your original post in this thread are not paired as shown in the examples in that post.