I'm manipulating a batch of about 2,000 HTML files. I just need to make some small changes, but to all the files at once.
For example, I want to delete the lines that have "embed_music" in all the files, or change all instances of the word "Paragraph" to "Absatz".
This is my pseudo-code:
open target folder of html files (/project/html/)
read in all html files
*do the stuff here:
check for lines containing "embed_music", if yes delete
string replace for words with "Paragraph" to "Absatz"
*
close folder
Is my logic correct? I'm attempting to do this with Python, would another language work better? Would appreciate any help or feedback!
I guess I'd have gone with a simple shell script letting find, and sed do the hard work:
find /project/html -name "*html" | while read filename
do
if [[ ! -f $filename- ]] # if a backup exists, don't do anything
then
mv $filename $filename- # make backup
sed '/embed_music/d; s/Paragraph/Absatz/;' $filename- >$filename # make changes
fi
done
Makes a backup of the original file (I like that safety net) and then makes the changes. If the backup file exists, no action is taken -- prevents overlaying your original file should something not work right and the script is run again.
Python certainly will work, but this I think is easiest.