Perl substr or similar help

I have a large string containing about 17,500 characters and I would like to obtain the value for token. token only appears in the entire string once and is towards the end of the string at the 17,200 area but that could change. Using perl can someone assist me with obtaining the value which in this particular case is 1685540303 as shown below. The length of value can be shorter or longer but not by much. Below is a small part of the entire string. Thank you.

<input type='hidden' name='token' value='1685540303'>

This is a DOM element in an HTML document, isn't that right?

<input type='hidden' name='token' value='1685540303'>

That depends completely upon what the other 17,000 characters are.

Yes HTML and javascript. I don't want to use perl module HTML::TokeParser::Simple. Thanks again.

You can try my awk XML processor.

awk -f yanx.awk -e 'TAG == "INPUT" && ARGS["NAME"]=="token" { print ARGS["VALUE"] }' ORS="\n" file.xml

If it doesn't work, you'll probably have to show us some of the other 17,000 characters. HTML is full of special cases that choke simple text processors.

1 Like

So I figured out a solution for myself.

Note: Below is just a small portion of my entire HTML string. The entire HTML string contains roughly 17,500 characters.

<input type='hidden' name='token' value='1685540303'>

my $loc = index($content, 'token');        # find start of token in entire string
my $str = substr $content, $loc + 12, 19;  # get text starting at value and ending at >
$str =~ m/'(.*?)'/;                        # get text between single quotes
my $result = $1;                           # move text into variable result

If someone can refine this please feel free to post it. Thank you.

This is not how web developers process the DOM in HTML, especially on the client side.

Should I assume you are processing the value on the server side?

Either way, your post confused me. On the client side , we easily get the value attribute of the input DOM element with Javacript:

var value = document.getElementsByTagName("input")[0].getAttribute("value");

If on the server side, normally you use PHP or PERL and you get the attribute value when you submit on the browser to the server.

PHP Example:

$value = $_GET['value'];

or

$value = $_POST['value']

We web developers do not process HTML files as big strings and use PERL or PHP or any language to extract a value; because this is easily done on both the client or server side with existing built-in methods.

Getting the value of an HTML DOM attribute does not require a script like you have written. Hidden values are normally submitted with HTML forms via the $_POST method.

This means on the server side, you would simple read the value of $_POST['value'] when the DOM element is submitted.

1 Like

From my OpenBSD firewall I use PERL to reboot my cable modem around 3:00 am. I do this daily to obtain a new IP from my ISP otherwise I end up with a static IP for weeks. The procedure I use:

Send modem GET command with username and password to login. (unable to send a POST only GET)
After login send GET command to get webpage (HTML) that contains the sessionKey.
Note: I parse this page using PERL to get the sessionKey. (token value)
Send POST command to modem with sessionKey and reboot command to reboot modem. (sessionKey is required)

PERL is installed in the base install of OpenBSD. So I use this along with PERL module HTTP::Tiny to achieve the above. I was using PERL module HTML::TokeParser::Simple to parse the webpage (HTML) from the GET command I send to obtain the sessionKey (token value) that I send in my POST command.

The reason I asked the question reference parsing the HTML using PERL is that by default HTML::TokeParser::Simple isn't installed on OpenBSD by default and I had to install that as a package. I was just trying to find another means.

1 Like

Excellent explaination.

Thanks.

Had you tried my earlier suggestion, it worked. With one correction -- the HTML you posted was wrong, the tag is not named 'token', it is named 'sessionKey'.

So my final code:

$ awk -f yanx.awk -e 'TAG == "INPUT" && ARGS["NAME"]=="sessionKey" { print ARGS["VALUE"] }' ORS="\n" my.html
1685540303

$
1 Like

I intentionally changed the tag to token for the same reason I didn't want to post the entire html page here. It's my cable modem reboot page. I do appreciate your feedback and the yanx.awk solution you provided. I didn't really want want to deal with an additional file (yanx.awk) that I needed to keep track of for future use though.

1 Like