[perl script] print the assembly instruction and count the occurence

acdc · January 22, 2015, 12:21pm

Hi,

I have a input file(text file) with the following lines.

0x000000 0x5a80 0x0060 BRA.l 0x60 ;file:UserCall.s ;function:_user_call_table ;C_sourceLine:24
0x000002 0x1bc5 RETI ;file:UserCall.s ;function:_user_call_table ;C_sourceLine:30
0x000003 0x6840 MOV R0L,R0L ;file:UserCall.s ;function:_user_call_table ;C_sourceLine:31
0x000004 0x1bc5 RETI ;file:UserCall.s ;function:_user_call_table ;C_sourceLine:35
0x000005 0x6840 MOV R0L,R0L ;file:UserCall.s ;function:_user_call_table ;C_sourceLine:36

Expected output is some thing like this :

BRA.l 0x60
RETI
MOV R0L,R0L
RETI
MOV R0L,R0L

and it should count the occurence too like
MOV R0L,R0L occured 2 times
RETI occured 1
BRA.l 0x60 occured 1

So far I have developed code and need help here

#!/usr/local/bin/perl -w


my $filename = 'C:\data1.txt';
my @opcode_var = 0;
my @s_words = 0;
my $fun_name = 0;
my $file_name = 0;
my $output_var = 0;
my $remove_hex = 0;
open(FILE,$filename) or die "Could not read from filename";
my @lines = <FILE>;
chop @lines;
my $word = 0;

foreach my $line(@lines) 
{
	if ($line =~ /0x*/)
	{
		chop ($line);
		@opcode_var = split(/ /,$line);
		if($opcode_var[2] =~ /0x*/)
		{
			print "$opcode_var[3] $opcode_var[4]\n";
		}
		else
		{
			if($opcode_var[2] =~ /0x*/)
			{
				print "$opcode_var[2] $opcode_var[3]\n";
			}
		}
	}
}

you can copy this code and print the output. Just stuck here. Learning to extract the assembly code from .s file.

Thank you.

any help is appreciated.

Corona688 · January 22, 2015, 12:40pm

Perl looks like overkill for this.

awk '{ sub(/;.*/, "");
        $1=""; $2="";
        sub(/^[ \r\n\t]*/, "");
        A[$0]++ ; print }
END {
        for(X in A) print X" appeared "A[X]" times";
}' inputfile

derekludwig · January 25, 2015, 5:20pm

Or even:

sed -e 's/ *;.*$//' -e 's/0x[^ ]* //g' inputfile | tee outfile | sort | uniq -c

---------- Post updated at 05:20 PM ---------- Previous update was at 04:00 PM ----------

Perlish solution, code to stdout, report to stderr:

use strict;
use warnings;

$\ = "\n";
my %COUNTS = ();

while (<>) {
    chomp;
    s{\s*;.*$}{};
    s{^\s*(0x[0-9a-z]+\s+)+}{}i;
    print;
    $COUNTS{$_}++;
}

while (my ($w, $n) = each %COUNTS) {
    printf STDERR "\%10d \%s\n", $n, $w;
}

Don_Cragun · January 25, 2015, 6:09pm

It looks like Corona688's script missed one detail; the number of fields at the start of a line beginning with 0x is not a constant 2. This slight modification to his script:

awk '
{	sub(/;.*/, "")
	while($1 ~ /^0/) {
		$1 = ""
		$0 = $0
	}
	sub(/^[ \r\n\t]*/, "")
	print
	A[$0]++
}
END {	print ""
	for(X in A)
		print "\"" X "\" appeared " A[X] " times."
}' file.s

(with the sample input your provided) produces the output:

BRA.l 0x60
RETI
MOV R0L,R0L
RETI
MOV R0L,R0L

"MOV R0L,R0L" appeared 2 times.
"BRA.l 0x60" appeared 1 times.
"RETI" appeared 2 times.

which seems to be what was requested.

If you want to try this on a Solaris/SunOS system, change awk to /usr/xpg4/bin/awk , /usr/xpg6/bin/awk , or nawk .

ongoto · January 25, 2015, 7:16pm

@acdc
I played with your original code a little and shamelessly stole the %COUNTS array from derekludwig. I came up with this without much regex...

#!/usr/local/bin/perl -w

use warnings;
use strict; 

# my @s_words = 0;
# my $fun_name = 0;
# my $file_name = 0;
# my $remove_hex = 0;
# my $word = 0;

my $filename = 'C:\data1.txt';
my @opcode_var = 0;
my $output_var = 0;
my %COUNTS = ();

open(my $FILE, '<', $filename) or die "Could not read from $filename";
my @lines = <$FILE>;
chop @lines;

foreach my $line(@lines) 
{
    if ($line =~ /0x*/)
    {
        chop ($line);
        @opcode_var = split(/ /,$line);
        
        if($opcode_var[2] =~ /0x*/)
        {
            $output_var = "$opcode_var[3] $opcode_var[4]";
        }
        else
        {
            if($opcode_var[1] =~ /0x*/)
            {
                if ($opcode_var[3] =~ /[;]+/)
                {
                    $output_var = "$opcode_var[2]";
                }
                else
                {
                    $output_var = "$opcode_var[2] $opcode_var[3]";
                }
            }
        }
    }
    print "$output_var\n";
    $COUNTS{$output_var}++;
}
print "\n";
while (my ($w, $n) = each %COUNTS)
{
    # printf STDERR "\(\%d\) \%s\n", $n, $w;
    printf STDERR "%-14s - occurred %-2d %s\n", $w, $n, $n < 2 ? "time" : "times";
}

close($FILE);

# eof #

# output
# ------
# BRA.l 0x60
# RETI
# MOV R0L,R0L
# RETI
# MOV R0L,R0L
#
# BRA.l 0x60     - occurred 1  time
# MOV R0L,R0L    - occurred 2  times
# RETI           - occurred 2  times
# or ...
# (1) BRA.l 0x60
# (2) MOV R0L,R0L
# (2) RETI

senhia83 · January 26, 2015, 12:01am

Another solution with awk,

$ cat tmp
0x000000 0x5a80 0x0060 BRA.l 0x60 ;file:UserCall.s ;function:_user_call_table ;C_sourceLine:24
0x000002 0x1bc5 RETI ;file:UserCall.s ;function:_user_call_table ;C_sourceLine:30
0x000003 0x6840 MOV R0L,R0L ;file:UserCall.s ;function:_user_call_table ;C_sourceLine:31
0x000004 0x1bc5 RETI ;file:UserCall.s ;function:_user_call_table ;C_sourceLine:35
0x000005 0x6840 MOV R0L,R0L ;file:UserCall.s ;function:_user_call_table ;C_sourceLine:36


$ awk -F";" '{split($1,a,"");  for (i=1;i<=length(a);i++) { if (a[i-1]a!="0x" && a[i-2]==" "  )  { print substr($1,i-1); break;}   }}' tmp
BRA.l 0x60
RETI
MOV R0L,R0L
RETI
MOV R0L,R0L

$ awk -F";" '{split($1,a,"");  for (i=1;i<=length(a);i++) { if (a[i-1]a!="0x" && a[i-2]==" "  )  { print substr($1,i-1); break;}   }}' tmp | sort | uniq -c
      1 BRA.l 0x60
      2 MOV R0L,R0L
      2 RETI

acdc · January 27, 2015, 12:40am

Thanks guys. I finally managed to write my own code.

Here it is. But I would still love to re-use the code with you permission.
@senhia83 @ongoto @Don Cragun @derekludwig Corona688
Thank you all guys !