how read specific line in a file and write it in a new text file?

I have list of files in a directory 'dir'. Each file is of type HTML. I need to read each file and get the string which starts with 'http' and write them in a new text file. How can i do this shell scripting?

file1.html

<head>
<url>http://www.google.com</url>
</head>

file2.html

<head>
<url>http://www.yahoo.com</url>
</head>

text.txt

http://www.google.com
http://www.yahoo.com

Assuming your file*.html just contain what you mentionned in your example:

grep -ho "http:[^<]*" file*.html >>text.txt

you can try this :wink:

# sed -n '/^<url>/s/<[^>]*>//gp' file*.html >>text.txt
# cat text.txt
http://www.google.com
http://www.yahoo.com

regards
ygemici

---------- Post updated at 11:37 AM ---------- Previous update was at 11:34 AM ----------

maybe you can add "-h" to grep for suppress filenames

2 Likes

@ygemici

Oooops, thx... fixed ! :slight_smile:

How to content where it ends with &CS=3

Ex:

http://www.google.com/test/&CS=3
http://www.google.com/sample/&CS=3
http://www.google.com/hello/&CS=3

text.txt

http://www.google.com/test/
http://www.google.com/sample/
http://www.google.com/hello/

just remove it :slight_smile:

# sed 's/&CS=3//' file1>text.txt

Thank you.