Perl: How do I remove leading non alpha characters

Hi,

Sorry for silly question, but I'm trying to write a perl script to operate a log file that is in following format:

(4)ab=1234/(10)bc=abcdef9876/cd=0....

The number in the brackets is the lenghts of the field, "/" is the field separator. Brackets are not leading every field.

What I'm trying to do is print the log in format:

ab=1234
bc=abcdef9876
cd=0

So far I've written the code below:

#!/bin/perl

$LOGFILE = "/path/to/logfile/filename.txt";
open(LOGFILE) or die("Could not open log file.");
foreach $line (<LOGFILE>) {
    
    @splitted = split(/\//, $line);
    
    foreach $element (@splitted){
        print "$element\n";
    }
}
close(LOGFILE);

However this prints out the leading brackets as well.

How can I get rid of the leading brackets?

Also the field may contain "/" e.g. "ef=a/b" how do I avoid this to be misinterpreted as the field separator?

Thanks! :confused:

Please show some of the real data, there might be a clue in it to help figure out a rule to use to split the lines up correctly. By what you posted it looks like you could use the field "names" (ab, bc, dc) to help split the fields up correctly, but I have a feeling that is psuedo data, not real data.

What kind of log file is this? There might already be a module written that understands the log format.

Anyways, for the lines with no forward slash in the values:

#!/bin/perl
use strict;
use warnings;

my $LOGFILE = "/path/to/logfile/filename.txt";
open(LOGFILE, $LOGFILE) or die "Could not open log file :$!";
while (<LOGFILE>) {
   chomp;
   my @fields = split(/\//);
   s/^\(\d*?\)// for @fields;
   print "$_\n" for @fields;
}
close(LOGFILE);

the simplest way to tackle this problem is at the source, by not using "/" as the field separator.

Input File:

$ cat line.txt
(4)ab=1234/(10)bc=abcdef9876/cd=0
(4)ty=5234/(10)bc=abcdef9876/cd=0

Code:

perl -nle '/(\w+)=(\w+)/&&print "$1=$2"foreach split "/"' < line.txt

Output:

ab=1234
bc=abcdef9876
cd=0
ty=5234
bc=abcdef9876
cd=0

HTH

Considering he said the values can contain a forward slash it seems doubtful it will work.

Thanks for the good tips given already :slight_smile:

The data is just records of users accessing data. I don't think there is existing modules for this data as it is very specific for this log and not generally used.

the fields could have e.g. mt=image/gif (media type downloaded, could be any mime type really...)

Also there is field for browser type e.g. "bt=Mozilla/4"

A real example would look like this:

at=200802221200/cs=59278/(9)mt=image/gif/(9)bt=Mozilla/4...

Which tells the time of access to media (at) the content size (cs) media type (mt) and browser that was used to access the content (bt). There is about 100 different field names and they all are 2 letter combinations followed by "=" and then the value ending with the field separator "/", which I btw can't unfortunately change.

Maybe the data could be split somehow with the fieldnames like KevinADC suggested.

Thanks

one possible way:

#!/bin/perl
use strict;
use warnings;

my $LOGFILE = "/path/to/logfile/filename.txt";
open(LOGFILE, $LOGFILE) or die "Could not open log file :$!";
while (<LOGFILE>) {
   chomp;
   s/\(\d+\)//g; # remove the (n) part
   s#/([a-z]{2}=)#:::$1#g; # convert delimiter to :::
   my @fields = split(/:::/); #split using new delimiter
   print "$_\n" for @fields;
}
close(LOGFILE);

Probably someone better with regexps can write something shorter and possibly more efficient using zero-width look ahead/behind assertions, which I am not too good with.

Thanks a lot!! that seems to work perfectly :slight_smile:

Input

$ cat line.txt
(8)xx=1234/xyz/at=200802221200/cs=59278/(9)mt=image/gif/(9)bt=Mozilla/4
(8)zz=9999/abc/at=200902221200/cs=59278/(9)mt=text/html/(9)bt=Mozilla/4

Code

perl -nle '(/^(\w+)=(\w+)/&&print "$1=$2")||(/^\w+$/&&print)||(/^\(/ && /(\w+)=(\w+)/&&printf "$1=$2/")foreach split /\//' < line.txt

Output

xx=1234/xyz
at=200802221200
cs=59278
mt=image/gif
bt=Mozilla/4
zz=9999/abc
at=200902221200
cs=59278
mt=text/html
bt=Mozilla/4