I need help:
I started receiving automatic emails containing download information. The problem is that these emails are coming in a rich format (I have no control of this) so the important information is buried under a bunch of mumbo-jumbo. To complicated things even further I need to automated the download process too so I need to somehow identify and extract the exact path to the file and forward it for further processing
the relevant part of the email looks something like this:
so the part that I need to extract from here is
afp://server.company.com/del/e/QQ888-9999/QQ888-9999-3/QQ888-999-3.dmg
the problem is that the path to the file is split with "=" so that would have to be removed somehow (if present)
also I am not sure how to remove anything present before afp:// (like href=3D" in this case) or anything present after .dmg (
">del/QQ888-9999/QQ888-9999-3</a></td= in this case)
to expand on this, most of the time I would get an email with not one, but two files to download (and two to avoid).
would you mind suggesting a loop that would extract both afp links
for example:
afps to get:
afp://MYserver.company.com/del/e/QQ888-9999/QQ888-9999-/QQ888-9999-3.dmg
and
afp://MYserver.company.com/del/e/QQ666-7777/QQ666-7777-/QQ666-7777-3.dmg
both buried in the rich formatting non-sense.
to makes things a bit more complicated, the email would also contain a couple of afp links to a different server, that I would need to be skipped
for example
afps to be skipped:
afp://NOTMYserver.company.com/del/e/QQ888-9999/QQ888-9999-/QQ888-9999-3.dmg
and
afp://NOTMYserver.company.com/del/e/QQ666-7777/QQ666-7777-/QQ666-7777-3.dmg
again, unbelievable. thank you guys, what would take me days (if not weeks) to figure out is sometimes just a couple of posts away. anyway this will be a great starting point for me to learn something new and useful.
The solutions posted so far fail to cope with the = followed by newlines part of the encoding. Also there are other characters which might or might not be encoded using quoted-printable. I would recommend that you split the processing into two steps: decoding the QP, and extracting the information you want.
You could pipe the output from that to what you already have (use it instead of the tr command you had before) or extend the Perl script to also extract the information you require: