I can find and replace text when the delimiters are unique. What I cannot do is replace text using two NON-unique delimiters:
Ex.,
"This html code <text blah >contains <garbage blah blah >. All tags must go,<text > but some must be replaced with <garbage blah blah > without erasing other info."
delimiter1: '<garbage'
delimiter2: '>'
replace with: 'important info'
delimiter3: '<'
delimiter4: '>'
replace with: ''
I get this:
This html code contains important info
And I want this:
This html code contains important info. All tags must go, but some must be replaced with important info without erasing other info.
The issue is that the program keeps seeing the '>' which is tied in with the'<text >' tag and using it instead of using the '>' which is tied in with '<garbage'.
In my real-world scenario, these tags are much more complicated and will have a variety of text inbetween whilst being different sizes and having different endings; also, certain tags must be deleted first, second, and so on, so changing the order will not help this situation.
I want to make code that understands that the '>' delimiter, which I want to use as an end position for '<garbage' tag, can only be the one which comes closest AFTER the '<garbage' tag (and if it understands that, then it cannot make a mistake); but I do not know how to do this. I have it working perfectly in an awk program, but not in C++. And I will not use boost; I'd rather then just stick with awk in that case.
Here is my code:
// Compile and run with:
//
// g++ -O -Wall replace.cpp -o replace
//
#include<iostream>
#include<string>
#include<fstream>
using namespace std;
string replaceText (string text, string tStart, string tStop, string tReplace)
{
long int begPos;
long int endPos;
int found=1;
while ((text.find(tStart) != std::string::npos) && (found == 1)) {
found = 0;
begPos = text.find(tStart);
endPos = text.find(tStop);
if (tStop != "")
{
text.replace(begPos, endPos - begPos + tStop.length(), tReplace);
found = 1;
}else{
text.replace(begPos, tStart.length(), tReplace );
found = 1;
}
// Used for testing to see positions of replaced text:
std::cout << "Replacing from: " << tStart << " ...to... " << tStop << " at Start Pos: " << begPos << " Stop Pos: " << endPos << " with " << tReplace << " \n" << endl;
}
return text;
}
int main(int argc, char* argv[])
{
keyFound="This html code <text blah >contains <garbage blah blah >. All tags must go, <text > but some must be replaced with <garbage blah blah > without easing other info.";
// Run this code twice: once with the below line of code commented, and once without:
keyFound=replaceText(keyFound, "<garbage", ">", "important info");
keyFound=replaceText(keyFound, "<", ">", "");
std::cout << keyFound << endl;
return 0;
}
I am not expecting an entire answer, but maybe if someone could lead me to a resource which has a fitting answer. I've been looking all around, and I cannot seem to find anything. Also, I am new to C++.
I understand that this is an incredibly complicated thing with no simple answer.
Thank you.