Insert tags which matches the pattern

lxdorney · July 14, 2015, 5:38am

Hi Guys,

How to achieve this in awk or sed:
Patterns: A.B. No. T-8346 or A.B. No. T-8xxx
will look like this:
Patterns: A.B. No. T-8346<br> or A.B. No. T-8xxx<br>

#cat file.txt

 [A.B. No. T-8346, January 01, 2015 ]
JHON VS. PETER, AGOO PET.

How Old Are You

the file will look like this:

 A.B. No. T-8346<br> January 01, 2015
JHON VS. PETER, AGOO PET.

How Old Are You

Thanks

RavinderSingh13 · July 14, 2015, 5:46am

Hello lxdorney,

Following may help you in same but as per your given pattern of line [A.B. No. T-8346, January 01, 2015 ] .

 awk '{if($0 ~ /^ \[A\.B/){gsub(/\[|\]/,X,$0);sub(/\,/,"<br>",$0)};print;next} 1'  Input_file

Output will be as follows.

  A.B. No. T-8346<br> January 01, 2015
JHON VS. PETER, AGOO PET.
How Old Are You

Thanks,
R. Singh

RudiC · July 14, 2015, 5:49am

Please be way more careful when preparing specifications! Your desired output does not match the verbal spec.
Where have the opening and closing brackets gone in your output? Where's the comma after the pattern?

And, any attempt from your side?

lxdorney · July 15, 2015, 1:39am

Yes that's what we want to achieve, thank you so much..
if you could explain little by little much better to help other as well

Thanks again

RavinderSingh13 · July 15, 2015, 3:25am

Hello lxdorney,

Following is the explanation for same, hope that helps you. Also please always do show us your efforts as we all are here to learn. You can hit thanks button present at bottom of threads to anyone to whom you want to thank or appreciate.

awk '{if($0 ~ /^ \[A\.B/){gsub(/\[|\]/,X,$0);sub(/\,/,"<br>",$0)};print;next} 1'  Input_file
 
if($0 ~ /^ \[A\.B/)            #### looking for condition where it matches line which is starting from [A.B
gsub(/\[|\]/,X,$0)             #### If above if condition is TRUE then substitute all matches of [ and ] characters in that matched line with NULL.
;sub(/\,/,"<br>",$0)           #### Then substitute first match of charcter , wth <br>
;print                         #### Do print operation to newly formed line includng changes in above steps
next                           #### Now don't do any further operaion and go to next line
1                              #### awk works on condition and operation method so when condition is TRUE do certain operations as mentioned by user.
                                    so by giving 1 we are making condition TRUE and then no action/operatin given by me to it will perform default action/operation
                                    which is printing the lines.

Hope above helps you. Enjoy learning.

Thanks,
R. Singh

Akshay_Hegde · July 15, 2015, 7:36am

In current context, you can simplify like this also

[akshay@localhost tmp]$ cat file
 [A.B. No. T-8346, January 01, 2015 ]
JHON VS. PETER, AGOO PET.

How Old Are You
 [A.C. No. T-8346, January 01, 2015 ]
JHON VS. PETER, AGOO PET.

How Old Are You

If pattern found, then substitute first found comma with <br> and global substitution, +1 default action print line.

[akshay@localhost tmp]$ awk '(/^ \[A.B/ && sub(/,/,"<br>") && gsub(/[\[\]]/,""))+1' file

 A.B. No. T-8346<br> January 01, 2015 
JHON VS. PETER, AGOO PET.

How Old Are You
 [A.C. No. T-8346, January 01, 2015 ]
JHON VS. PETER, AGOO PET.

How Old Are You

Don_Cragun · July 15, 2015, 4:38pm

Although the above works with some versions of awk , the standards say the only backslash escapes that are required to be recognized by awk are: \\ , \/ , \" , \a , \b , \f , \n , \r , \t , and \v , and \d , \dd , and \ddd (where d is an octal digit). Anything else following a \ in a bracket expression produces undefined behavior.

In a bracket expression, if the first character inside the brackets (or immediately after the ^ in a non-matching list in a non-matching bracket expression) is a closing bracket, that closing bracket is a character in the list; not a terminator for the matching expression. And, inside a bracket expression, there is nothing special about an opening bracket except when it appears as the start of a collating symbol expression, the start of an equivalence class expression, or the start of a character class expression. So, two standard ways to specify an ERE containing a bracket expression matching the two characters [ and ] in awk are /[][]/ and "[][]" .

lxdorney · July 15, 2015, 10:17pm

Is this only print the output to the screen how about if write to another file with desame filename but on other directory:
something like this:

awk '(/^ \[A.B/ && sub(/,/,"<br>") && gsub(/[\[\]]/,""))+1' file >/tmp/file
awk '(/^ \[A.B/ && sub(/,/,"<br>") && gsub(/[\[\]]/,""))+1' file1 >/tmp/file1

Here I have done now

find /root/dir/ -name *.* | awk '{print "awk '\'\/\<div" "id\=\\\\\\'"""left\'\\'\'\"""\/{a\=1}\/\<\\\\\\\!""""--" "footer" "template" "--\>\/{print\;a\=0}a\'' "$1}'

Ouput:

awk '/<div id=\"left\"/{a=1}/<\!-- footer template -->/{print;a=0}a' /root/dir/2000/file1.htm
awk '/<div id=\"left\"/{a=1}/<\!-- footer template -->/{print;a=0}a' /root/dir/2000/file2.htm

I want to achieve something like this:

awk '/<div id=\"left\"/{a=1}/<\!-- footer template -->/{print;a=0}a' /root/dir/2000/file1.htm >/root/dir1/2000/file1.htm
awk '/<div id=\"left\"/{a=1}/<\!-- footer template -->/{print;a=0}a' /root/dir/2000/file2.htm >/root/dir1/2000/file2.htm

Aia · July 15, 2015, 11:13pm

awk '{print "awk '\'\/\<div" "id\=\\\\\\'"""left\'\\'\'\"""\/{a\=1}\/\<\\\\\\\!""""--" "footer" "template" "--\>\/{print\;a\=0}a\'' "$1}'

You know, the reason for using all those ungodly escapes is the shell. If you make an awk script file and then, call it as awk -f filename it would eliminate quite a bit of clutter.

However, I have the impression that what you're doing is the equivalent of taking an automatic M16 rifle, bend it with a slight curve, along its body, attach a string to the barrel and to the stock and try to shoot arrows with it.

Nevertheless, without a clearer explanation of what's the end result, that's what you got.

lxdorney · July 15, 2015, 11:25pm

Yes you right, Im trying to find way locate all html files and cut the content between two patterns to clean. and write to another location

Akshay_Hegde · July 15, 2015, 11:40pm

don cragun:

Although the above works with some versions of awk , the standards say the only backslash escapes that are required to be recognized by awk are: \\ , \/ , \" , \a , \b , \f , \n , \r , \t , and \v , and \d , \dd , and \ddd (where d is an octal digit). Anything else following a \ in a bracket expression produces undefined behavior.

In a bracket expression, if the first character inside the brackets (or immediately after the ^ in a non-matching list in a non-matching bracket expression) is a closing bracket, that closing bracket is a character in the list; not a terminator for the matching expression. And, inside a bracket expression, there is nothing special about an opening bracket except when it appears as the start of a collating symbol expression, the start of an equivalence class expression, or the start of a character class expression. So, two standard ways to specify an ERE containing a bracket expression matching the two characters [ and ] in awk are /[][]/ and "[][]" .

Thank you so much Don I got something new to learn from you .