I am currently working on a requirement in a file wher I have to filter the characters between two specific fields/patters and get the count of total no of characters between the two fields.
REQUIREMENT:
The below content is in a file
I have to get the no of characters between each instance starting with <test> and </test1> throughout the file.
With the no of characters obtained,
Find
No of tags which has more than 30 characters between them
No of tags which has less than 30 characters between t hem.
sed - n '/<test>/,/<\/test1>/{s/<tag1>.*//;s/<tag2>.*//;p;}' input.txt
So got only the required tags in the output, but I am not able to count the characters between tags as some tags have newline in between and some tags don't have. I am a newbie to Shell. Kindly, please help
Based on your Input_file/samples shown, could you please try following and let me know if this helps you.
awk -vST=": TAG has characters replacement count is: " '($0 ~ /<test>/ && $0 ~ /<\/test1>/){gsub(/<test>|<\/test1>/,"");print ++instance ST gsub(/[a-zA-Z]/,"");if(Q>30){MAX++} else {MIN++};next} ($0 ~ /<test>/){A=1;sub(/<test>/,"");Q+=gsub(/a-zA-Z]/,"")} ($0 ~ /<\/test1>/ && A){A="";sub(/<\/test1>/,"");print ++instance ST Q+gsub(/[a-zA-Z]/,"");if(Q>30){MAX++} else {MIN++};next} A{Q+=gsub(/[a-zA-Z]/,"")} END{printf("%s%01d\n%s%01d\n","Number of tags having more than 30 replacement of characters are: ",MAX,"Number of tags having less than 30 replacement of characters are: ",MIN)}' Input_file
Output will be as follows.
1: TAG has characters replacement count is: 3
2: TAG has characters replacement count is: 22
Number of tags having more than 30 replacement of characters are: 0
Number of tags having less than 30 replacement of characters are: 2
EDIT: Adding a non-one liner form of solution too now successfully.
awk -vST=": TAG has characters replacement count is: " '
($0 ~ /<test>/ && $0 ~ /<\/test1>/){
gsub(/<test>|<\/test1>/,"");
print ++instance ST gsub(/[a-zA-Z]/,"");
if(Q>30){
MAX++
}
else {
MIN++
};
next
}
($0 ~ /<test>/) {
A=1;
sub(/<test>/,"");
Q+=gsub(/a-zA-Z]/,"")
}
($0 ~ /<\/test1>/ && A) {
A="";
sub(/<\/test1>/,"");
print ++instance ST Q+gsub(/[a-zA-Z]/,"");
if(Q>30){
MAX++
}
else {
MIN++
};
next
}
A {
Q+=gsub(/[a-zA-Z]/,"")
}
END{
printf("%s%01d\n%s%01d\n","Number of tags having more than 30 replacement of characters are: ",MAX,"Number of tags having less than 30 replacement of characters are: ",MIN)}
' Input_file
I'm afraid sed (alone) can't do that, as it can't calculate nor count. On top, your request is not quite clear - does the term "char" as used by you include digits and punctuation etc, or not? Please specify. If all that is included, try a combination like