how read specific line in a file and write it in a new text file?

vel4ever · February 15, 2012, 4:09am

I have list of files in a directory 'dir'. Each file is of type HTML. I need to read each file and get the string which starts with 'http' and write them in a new text file. How can i do this shell scripting?

file1.html

<head>
<url>http://www.google.com</url>
</head>

file2.html

<head>
<url>http://www.yahoo.com</url>
</head>

text.txt

http://www.google.com
http://www.yahoo.com

ctsgnb · February 15, 2012, 4:13am

Assuming your file*.html just contain what you mentionned in your example:

grep -ho "http:[^<]*" file*.html >>text.txt

ygemici · February 15, 2012, 4:37am

you can try this

# sed -n '/^<url>/s/<[^>]*>//gp' file*.html >>text.txt
# cat text.txt
http://www.google.com
http://www.yahoo.com

regards
ygemici

---------- Post updated at 11:37 AM ---------- Previous update was at 11:34 AM ----------

maybe you can add "-h" to grep for suppress filenames

ctsgnb · February 15, 2012, 4:42am

@ygemici

Oooops, thx... fixed !

vel4ever · February 15, 2012, 4:47am

How to content where it ends with &CS=3

Ex:

http://www.google.com/test/&CS=3
http://www.google.com/sample/&CS=3
http://www.google.com/hello/&CS=3

text.txt

http://www.google.com/test/
http://www.google.com/sample/
http://www.google.com/hello/

ygemici · February 15, 2012, 5:02am

just remove it

# sed 's/&CS=3//' file1>text.txt

vel4ever · February 15, 2012, 6:40am

Thank you.