[perl script] print the assembly instruction and count the occurence

Hi,

I have a input file(text file) with the following lines.

0x000000 0x5a80 0x0060 BRA.l 0x60 ;file:UserCall.s ;function:_user_call_table ;C_sourceLine:24
0x000002 0x1bc5 RETI ;file:UserCall.s ;function:_user_call_table ;C_sourceLine:30
0x000003 0x6840 MOV R0L,R0L ;file:UserCall.s ;function:_user_call_table ;C_sourceLine:31
0x000004 0x1bc5 RETI ;file:UserCall.s ;function:_user_call_table ;C_sourceLine:35
0x000005 0x6840 MOV R0L,R0L ;file:UserCall.s ;function:_user_call_table ;C_sourceLine:36

Expected output is some thing like this :

BRA.l 0x60
RETI
MOV R0L,R0L
RETI
MOV R0L,R0L

and it should count the occurence too like
MOV R0L,R0L occured 2 times
RETI occured 1
BRA.l 0x60 occured 1

So far I have developed code and need help here

#!/usr/local/bin/perl -w


my $filename = 'C:\data1.txt';
my @opcode_var = 0;
my @s_words = 0;
my $fun_name = 0;
my $file_name = 0;
my $output_var = 0;
my $remove_hex = 0;
open(FILE,$filename) or die "Could not read from filename";
my @lines = <FILE>;
chop @lines;
my $word = 0;

foreach my $line(@lines) 
{
	if ($line =~ /0x*/)
	{
		chop ($line);
		@opcode_var = split(/ /,$line);
		if($opcode_var[2] =~ /0x*/)
		{
			print "$opcode_var[3] $opcode_var[4]\n";
		}
		else
		{
			if($opcode_var[2] =~ /0x*/)
			{
				print "$opcode_var[2] $opcode_var[3]\n";
			}
		}
	}
}

you can copy this code and print the output. Just stuck here. Learning to extract the assembly code from .s file.

Thank you.

any help is appreciated.

Perl looks like overkill for this.

awk '{ sub(/;.*/, "");
        $1=""; $2="";
        sub(/^[ \r\n\t]*/, "");
        A[$0]++ ; print }
END {
        for(X in A) print X" appeared "A[X]" times";
}' inputfile
1 Like

Or even:

sed -e 's/ *;.*$//' -e 's/0x[^ ]* //g' inputfile | tee outfile | sort | uniq -c

---------- Post updated at 05:20 PM ---------- Previous update was at 04:00 PM ----------

Perlish solution, code to stdout, report to stderr:

use strict;
use warnings;

$\ = "\n";
my %COUNTS = ();

while (<>) {
    chomp;
    s{\s*;.*$}{};
    s{^\s*(0x[0-9a-z]+\s+)+}{}i;
    print;
    $COUNTS{$_}++;
}

while (my ($w, $n) = each %COUNTS) {
    printf STDERR "\%10d \%s\n", $n, $w;
}
1 Like

It looks like Corona688's script missed one detail; the number of fields at the start of a line beginning with 0x is not a constant 2. This slight modification to his script:

awk '
{	sub(/;.*/, "")
	while($1 ~ /^0/) {
		$1 = ""
		$0 = $0
	}
	sub(/^[ \r\n\t]*/, "")
	print
	A[$0]++
}
END {	print ""
	for(X in A)
		print "\"" X "\" appeared " A[X] " times."
}' file.s

(with the sample input your provided) produces the output:

BRA.l 0x60
RETI
MOV R0L,R0L
RETI
MOV R0L,R0L

"MOV R0L,R0L" appeared 2 times.
"BRA.l 0x60" appeared 1 times.
"RETI" appeared 2 times.

which seems to be what was requested.

If you want to try this on a Solaris/SunOS system, change awk to /usr/xpg4/bin/awk , /usr/xpg6/bin/awk , or nawk .

1 Like

@acdc
I played with your original code a little and shamelessly stole the %COUNTS array from derekludwig. I came up with this without much regex...

#!/usr/local/bin/perl -w

use warnings;
use strict; 

# my @s_words = 0;
# my $fun_name = 0;
# my $file_name = 0;
# my $remove_hex = 0;
# my $word = 0;

my $filename = 'C:\data1.txt';
my @opcode_var = 0;
my $output_var = 0;
my %COUNTS = ();

open(my $FILE, '<', $filename) or die "Could not read from $filename";
my @lines = <$FILE>;
chop @lines;

foreach my $line(@lines) 
{
    if ($line =~ /0x*/)
    {
        chop ($line);
        @opcode_var = split(/ /,$line);
        
        if($opcode_var[2] =~ /0x*/)
        {
            $output_var = "$opcode_var[3] $opcode_var[4]";
        }
        else
        {
            if($opcode_var[1] =~ /0x*/)
            {
                if ($opcode_var[3] =~ /[;]+/)
                {
                    $output_var = "$opcode_var[2]";
                }
                else
                {
                    $output_var = "$opcode_var[2] $opcode_var[3]";
                }
            }
        }
    }
    print "$output_var\n";
    $COUNTS{$output_var}++;
}
print "\n";
while (my ($w, $n) = each %COUNTS)
{
    # printf STDERR "\(\%d\) \%s\n", $n, $w;
    printf STDERR "%-14s - occurred %-2d %s\n", $w, $n, $n < 2 ? "time" : "times";
}

close($FILE);

# eof #

# output
# ------
# BRA.l 0x60
# RETI
# MOV R0L,R0L
# RETI
# MOV R0L,R0L
#
# BRA.l 0x60     - occurred 1  time
# MOV R0L,R0L    - occurred 2  times
# RETI           - occurred 2  times
# or ...
# (1) BRA.l 0x60
# (2) MOV R0L,R0L
# (2) RETI
1 Like

Another solution with awk,

$ cat tmp
0x000000 0x5a80 0x0060 BRA.l 0x60 ;file:UserCall.s ;function:_user_call_table ;C_sourceLine:24
0x000002 0x1bc5 RETI ;file:UserCall.s ;function:_user_call_table ;C_sourceLine:30
0x000003 0x6840 MOV R0L,R0L ;file:UserCall.s ;function:_user_call_table ;C_sourceLine:31
0x000004 0x1bc5 RETI ;file:UserCall.s ;function:_user_call_table ;C_sourceLine:35
0x000005 0x6840 MOV R0L,R0L ;file:UserCall.s ;function:_user_call_table ;C_sourceLine:36


$ awk -F";" '{split($1,a,"");  for (i=1;i<=length(a);i++) { if (a[i-1]a!="0x" && a[i-2]==" "  )  { print substr($1,i-1); break;}   }}' tmp
BRA.l 0x60
RETI
MOV R0L,R0L
RETI
MOV R0L,R0L

$ awk -F";" '{split($1,a,"");  for (i=1;i<=length(a);i++) { if (a[i-1]a!="0x" && a[i-2]==" "  )  { print substr($1,i-1); break;}   }}' tmp | sort | uniq -c
      1 BRA.l 0x60
      2 MOV R0L,R0L
      2 RETI

1 Like

Thanks guys. I finally managed to write my own code.

Here it is. But I would still love to re-use the code with you permission.
@senhia83 @ongoto @Don Cragun @derekludwig Corona688
Thank you all guys !