Convert text between exact matching patterns to Title case

martinsmith · February 10, 2016, 6:42pm

Hi Folks,

I have a large text file with multiple similar patterns on each line like:

blank">PATTERN1 some word PATTERN2
title=">PATTERN1 some word PATTERN2
blank">PATTERN1 another word PATTERN2
title=">PATTERN1 another word PATTERN2
blank">PATTERN1 one more time PATTERN2
title=">PATTERN1 one more time PATTERN2

I would like to convert all occurrences between title=">PATTERN1 and PATTERN2 to Title Case format.

Desired output:

blank">PATTERN1 some word PATTERN2
title=">PATTERN1 Some Word PATTERN2
blank">PATTERN1 another word PATTERN2
title=">PATTERN1 Another Word PATTERN2
blank">PATTERN1 one more time PATTERN2
title=">PATTERN1 One More Time PATTERN2

Any idea on accomplishing this is much appreciated. Thanks very much for your help!

Aia · February 10, 2016, 7:59pm

cat patterns.file

blank">PATTERN1 some word PATTERN2
title=">PATTERN1 some word PATTERN2
blank">PATTERN1 another word PATTERN2
title=">PATTERN1 another word PATTERN2 and leave this alone PATTERN2
blank">PATTERN1 one more time PATTERN2
title=">PATTERN1 one more time but not last PATTERN2

Please, give it a try:

perl -ple '/title=">PATTERN1\s+(.*?)\s+PATTERN2/ and $s=$1 and s|$s|join " ", (map {ucfirst} split /\s/, $s)|e' patterns.file

blank">PATTERN1 some word PATTERN2
title=">PATTERN1 Some Word PATTERN2
blank">PATTERN1 another word PATTERN2
title=">PATTERN1 Another Word PATTERN2 and leave this alone PATTERN2
blank">PATTERN1 one more time PATTERN2
title=">PATTERN1 One More Time But Not Last PATTERN2

martinsmith · February 10, 2016, 9:52pm

Hi Aia, It works perfectly as i was hoping for! Thank you very much for this solution.

RavinderSingh13 · February 11, 2016, 12:08am

martinsmith:

Hi Folks,

I have a large text file with multiple similar patterns on each line like:
blank">PATTERN1 some word PATTERN2
title=">PATTERN1 some word PATTERN2
blank">PATTERN1 another word PATTERN2
title=">PATTERN1 another word PATTERN2
blank">PATTERN1 one more time PATTERN2
title=">PATTERN1 one more time PATTERN2
I would like to convert all occurrences between title=">PATTERN1 and PATTERN2 to Title Case format.
Desired output:
blank">PATTERN1 some word PATTERN2
title=">PATTERN1 Some Word PATTERN2
blank">PATTERN1 another word PATTERN2
title=">PATTERN1 Another Word PATTERN2
blank">PATTERN1 one more time PATTERN2
title=">PATTERN1 One More Time PATTERN2
Any idea on accomplishing this is much appreciated. Thanks very much for your help!

Hello martinsmith,

Following awk solution may help you too in same, let me know if this helps.

awk '/^title=\">PATTERN1.*PATTERN2$/{A=$1;for(i=2;i<NF;i++){A=A OFS toupper(substr($i,1,1)) substr($i,2)};print A OFS $NF;A="";next} {print}'  Input_file

Output will be as follows.

blank">PATTERN1 some word PATTERN2
title=">PATTERN1 Some Word PATTERN2
blank">PATTERN1 another word PATTERN2
title=">PATTERN1 Another Word PATTERN2
blank">PATTERN1 one more time PATTERN2
title=">PATTERN1 One More Time PATTERN2

On a Solaris/SunOS system, change awk to /usr/xpg4/bin/awk , /usr/xpg6/bin/awk , or nawk .

Thanks,
R. Singh

martinsmith · February 11, 2016, 10:52am

Thank you Aia and R.Singh.

Both solutions work great but i ran into a small issue. It seems to replace only the first occurrence and ignores the others.

I tried the following sample code with tokyo city, london and new york which i needed in Title Case.

title=">PATTERN1 tokyo city PATTERN2 whatever whatever title=">PATTERN1 london PATTERN2 whatever whatever title=">PATTERN1 new york PATTERN2 whatever whatever ....

I only got Tokyo City in Title case

title=">PATTERN1 Tokyo City PATTERN2 whatever whatever title=">PATTERN1 london PATTERN2 whatever whatever title=">PATTERN1 new york PATTERN2 whatever whatever ....

I was hoping to get every single occurrence between those patterns in Title case like: Tokyo City, London and New York

title=">PATTERN1 Tokyo City PATTERN2 whatever whatever title=">PATTERN1 London PATTERN2 whatever whatever title=">PATTERN1 New York PATTERN2 whatever whatever ....

Any ideas ? Thank you for your time.

Don_Cragun · February 11, 2016, 4:37pm

Will your input ever contain the string title=">PATTERN1 without a matching string PATTERN2 on the same line, such as one of the following?:

title=">PATTERN1 Tokyo City whatever whatever title=">PATTERN1 London PATTERN2 whatever whatever

or:

title=">PATTERN1 Tokyo City PATTERN2 whatever whatever title=">PATTERN1 London whatever whatever.

martinsmith · February 11, 2016, 10:18pm

don cragun:

Will your input ever contain the string title=">PATTERN1 without a matching string PATTERN2 on the same line, such as one of the following?:
title=">PATTERN1 Tokyo City whatever whatever title=">PATTERN1 London PATTERN2 whatever whatever
or:
title=">PATTERN1 Tokyo City PATTERN2 whatever whatever title=">PATTERN1 London whatever whatever.

Hi Don,

Yes it will, and also where sometimes

PATTERN2

appears before

title=">PATTERN1

Example:

PATTERN2 whatever whatever ..... title=">PATTERN1 tokyo city PATTERN2 whatever whatever title=">PATTERN1 London PATTERN2 whatever whatever .....title=">PATTERN1 whatever..........

Don_Cragun · February 11, 2016, 10:28pm

And, in those cases, what is supposed to happen? Does capitalization start with the first string and continue until the second string is seen on a subsequent line, or is capitalization only performed between an occurrence of the first string up to the next occurrence of the second string that appears on the same line.

And, on a line like:

title=">PATTERN1 tokyo city whatever whatever title=">PATTERN1 london PATTERN2

is the output supposed to be:

title=">PATTERN1 Tokyo City Whatever Whatever Title=">PATTERN1 London PATTERN2

or:

title=">PATTERN1 tokyo city whatever whatever title=">PATTERN1 London PATTERN2

Aia · February 11, 2016, 10:29pm

Would any of these do it?

perl -ple '$cl=$_; while(/title=">PATTERN1\s+(.*?)\s+PATTERN2/g){$opt=$mpt=$&; $os=$ms=$1; $ms=~s/\b(\w)/\U$1/g; $mpt=~s/$os/$ms/; $cl=~s/$opt/$mpt/}$_=$cl' patterns.file

perl -ne '$cl=$_; while(/title=">PATTERN1\s+(.*?)\s+PATTERN2/g){$opt=$mpt=$&; $os=$ms=$1; $ms=~s/\b(\w)/\U$1/g; $mpt=~s/$os/$ms/; $cl=~s/$opt/$mpt/} print $cl' patterns.file

martinsmith · February 11, 2016, 10:42pm

That's the result i needed. ( with capitalization only performed between an occurrence of the first string up to the next occurrence ) Thank you Don!.
The perl solution from Aia solved the issue.

PATTERN2 whatever goes to tokyo city ....title=">PATTERN1 tokyo city PATTERN2 whatever whatever whatever title=">PATTERN1 london PATTERN2 whatever whatever whatever title=">PATTERN1 new york PATTERN2 whatever whatever whatever ....

perl -ple '$cl=$_; while(/title=">PATTERN1\s+(.*?)\s+PATTERN2/g){$opt=$mpt=$&; $os=$ms=$1; $ms=~s/\b(\w)/\U$1/g; $mpt=~s/$os/$ms/} $cl=~s/$opt/$mpt/};$_=$cl' patterns.file

Result:

PATTERN2 whatever goes to tokyo city ....title=">PATTERN1 Tokyo City PATTERN2 whatever whatever whatever title=">PATTERN1 London PATTERN2 whatever whatever whatever title=">PATTERN1 New York PATTERN2 whatever whatever whatever

Cheers Aia!

Thanks a lot for everyones help.

Don_Cragun · February 12, 2016, 2:49am

I'm confused.

You said that with the input:

title=">PATTERN1 tokyo city whatever whatever title=">PATTERN1 london PATTERN2

you want the output:

title=">PATTERN1 tokyo city whatever whatever title=">PATTERN1 London PATTERN2

and you said that Aia's perl script does exactly what you want. But both of Aia's perl scripts produce the following output from that input:

title=">PATTERN1 Tokyo City Whatever Whatever Title=">PATTERN1 London PATTERN2

I'm not nearly as good at writing perl scripts as Aia is, but the following awk script seems to do what you have requested for every output you have specified with various sample inputs:

awk '
BEGIN {	OFF = "PATTERN2"
	ON = "title=\">PATTERN1"
}
{	n = on = 0
	for(i = 1; i <= NF; i++)
		if($i == ON) {
			if(!on)	n++
			on = 1
			start[n] = i + 1
		} else if($i == OFF) {
			if(on) {
				on = 0
				stop[n] = i - 1
			}
		}
	if(on)	n--
	for(i = 1; i <= n; i++)
		for(j = start; j <= stop; j++)
			$j = toupper(substr($j, 1, 1)) substr($j, 2)
}
1' file

As always, if you want to try this on a Solaris/SunOS system, change awk to /usr/xpg4/bin/awk or nawk . I'm guessing this still doesn't really do what you want, but you still haven't given us a complete specification for what happens when the three patterns you showed us in your original post in this thread are not paired as shown in the examples in that post.