Script to test bookmarks

Does anyone have a script to test for invalid bookmarks? I have a ton of them and I'd bet that a good percentage could be dropped.

Tnx

Guessing you are talking of web browser bookmarks, what format/structure are they? And, how would you define "invalid"?

@ RudiC

Hello. Invalid would mean 'broken links'; sites that no longer exist or are unreachable, etc.

I can extract the http links from html files (firefox) with sed easily enough, producing a list to loop through. I'm just not sure whether to use wget or ??? to attempt to reach the sites and get the error msgs. I want to do it in the console, not the browser. I thought I'd ask around for a script or advice from someone who might have already done this.

Thanks for the reply.

Hmm. The biggest problem I see is how many sites redirect bad links to valid pages. Try http://www.unix.com/qwerty/ui/op for an example. It's one thing to catch a 404 -- another to check that the contents were what you wanted.

You can weed out pages that redirect, at least.

$ wget --max-redirect=0 --spider burningsmell.org/asdf burningsmell.org/index.html burningsmell.org/index.php -nv -O /dev/null
0 redirections exceeded.
0 redirections exceeded.
2014-12-20 23:57:32 URL: http://burningsmell.org/index.php 200 OK

$

Use -i filename to read a list of URL's to check from file.

Thanks for that Corona688

wget does the deed nicely. I found that there are some false positives. Some websites just dont appreciate robots, so they return 'broken link' when the site is actually alive and reachable. But for my intentions that's no big deal.

What I have so far...

#!/bin/bash

SORTED_LIST=./urls.lst
BOOKMARKS=./bookmarks.html
toDEL=./del.lst
> $toDEL

while read XX
do
    log=$(wget --tries=1 -T 5 --spider -nv $XX 2>&1)
    if [[ $(echo $log | grep -c "broken link") -gt 0 ]]; then
        echo "$log"
        echo "$XX" >> $toDEL
        # sed -n "s|$XX|&|p" $BOOKMARKS
        sed -i "s|^.*$XX.*$||g" $BOOKMARKS
    fi
done < $SORTED_LIST

Found 370 broken links out of 4,000+ bookmarks.