Bash Command To Delete Number from Array

pmurray21 · September 12, 2013, 4:52am

Hi,

I am writing a script to split a log file - the log could contain multiple days worth of logs. The second line of the log contains the string "Version ". In my test log which comprises of two days worth of logs, this string appears twice - once each day.

Essentially I would like to split on the line before this line, unfortunately there is nothing unique in it. The script thus far gets the line number that "Version " is on and populates an array. My problem is....I am not sure how to subtract one from each array element. The main part of my code looks like this:

IFS=$'\n'
arr=($(grep -n "Version " $filename))
arr=("${arr[@]%%:*}")
unset IFS
for i in "${arr[@]}"
do
  echo "${arr[a]}"
  csplit -f $filename $filename $i &>/dev/null
  rm -f $filename'00'
done
exit 1;

I delete the file ending in 00 currently as a hack because it currently splits on line 2, so that log actually only contains one line. And the result from the echo is:

2 1177299

So in this instance from my one log, I want to create two logs - the second log will begin on line 1177298

Thoughts on how to do this, or a better way to achieve it?

Cheers

jim_mcnamara · September 12, 2013, 6:26am

Your code is "interesting" syntactically. We can deal with that later.
Example:

exit 1

means return failure, so your script "always fails" as far as the caller is concerned.

Could you please post sample input.

pmurray21 · September 12, 2013, 7:21am

This is a trimmed example of the input:

Info: ****************************************************************
Info: App:  Version 2.0.1  Date 2013/09/01
Info: ****************************************************************
Info: Started at:         Tue Sep 10 10:30:04 2013
Info: Hostname:           Server1
Info: Configuration file: App
Info: blah
Info: blah
Info: blah
Info: blah
Info: blah
Info: blah
Info: blah
Info: ****************************************************************
Info: App:  Version 2.0.1  Date 2013/09/01
Info: ****************************************************************
Info: Started at:         Tue Sep 11 10:30:03 2013
Info: Hostname:           Server1
Info: Configuration file: App
Info: blah
Info: blah
Info: blah
Info: blah
Info: blah
Info: blah

Note there are config options in the application to have the date and or time at the start of each line. In this example I would want two files created from this.....one containing lines 1-13 and the second containing line 14-end

Cheers

drl · September 12, 2013, 7:36am

Hi.

Addressing the splitting of the file with standard utility csplit:

#!/usr/bin/env bash

# @(#) s1	Demonstrate splitting of file at /pattern/+-count, csplit.

# Utility functions: print-as-echo, print-line-with-visual-space, debug.
# export PATH="/usr/local/bin:/usr/bin:/bin"
pe() { for _i;do printf "%s" "$_i";done; printf "\n"; }
pl() { pe;pe "-----" ;pe "$*"; }
db() { ( printf " db, ";for _i;do printf "%s" "$_i";done;printf "\n" ) >&2 ; }
db() { : ; }
C=$HOME/bin/context && [ -f $C ] && $C csplit

FILE=${1-data1}

# Remove debris from previous runs.
rm -f xx??

pl " Input data file $FILE:"
cat $FILE

pl " Current situation of resultant xx?? files (expecting none):"
ls xx??

pl " Results, counts of split file in characters:"
csplit -k -z $FILE '/Version/-1' '{*}'

pl " Current situation, and content of files:"
ls -lgG xx??
pe
head xx??

exit 0

producing:

$ ./s1

Environment: LC_ALL = C, LANG = C
(Versions displayed with local utility "version")
OS, ker|rel, machine: Linux, 2.6.26-2-amd64, x86_64
Distribution        : Debian GNU/Linux 5.0.8 (lenny) 
bash GNU bash 3.2.39
csplit (GNU coreutils) 6.10

-----
 Input data file data1:
Unrelated stuff
Version something
No discernible pattern
Version otherwise

-----
 Current situation of resultant xx?? files (expecting none):
ls: cannot access xx??: No such file or directory

-----
 Results, counts of split file in characters:
34
41

-----
 Current situation, and content of files:
-rw-r--r-- 1 34 Sep 12 06:33 xx00
-rw-r--r-- 1 41 Sep 12 06:33 xx01

==> xx00 <==
Unrelated stuff
Version something

==> xx01 <==
No discernible pattern
Version otherwise

See man pages for details.

Best wshes ... cheers, drl

pmurray21 · September 12, 2013, 9:37am

Thanks for that!

How can i modify it, so that the output filename contains the date from the log line "Started at"?

The second output file in this case might be called output.log_10Sep13_10:30:04 for example to signify the date it was started on.

drl · September 12, 2013, 11:17am

Hi.

Here's one method with perl:

#!/usr/bin/env bash

# @(#) s2	Demonstrate splitting of file at /pattern/+-count, csplit.

# Utility functions: print-as-echo, print-line-with-visual-space, debug.
# export PATH="/usr/local/bin:/usr/bin:/bin"
pe() { for _i;do printf "%s" "$_i";done; printf "\n"; }
pl() { pe;pe "-----" ;pe "$*"; }
db() { ( printf " db, ";for _i;do printf "%s" "$_i";done;printf "\n" ) >&2 ; }
db() { : ; }
C=$HOME/bin/context && [ -f $C ] && $C csplit perl

FILE=${1-data2}

# Remove debris from previous runs.
rm -f xx?? output*

pl " Input data file $FILE:"
cat $FILE

pl " Current situation of resultant xx?? files (expecting none):"
ls xx??

pl " Results, counts of split file in characters:"
csplit -k -z $FILE '/Version/-1' '{*}'

pl " Current situation, and content of files:"
ls -lgG xx??
pe
head xx??

pl " Renaming script, sample in perl:"
cat p1

# Rename split file based on content of file.
# Sample data line:
# Info: Started at:         Tue Sep 10 10:30:04 2013
# New file name from fragments of data line:
# output.log_10Sep13_10:30:04
#
# extract line "Info: started at" from file
# split into array, create string from fields 758_6
# close, issue rename file to output.log_<string>
for file in xx*
do
  ./p1 $file
done

pl " Renamed files:"
ls -1 output*

exit 0

producing:

$ ./s2

Environment: LC_ALL = C, LANG = C
(Versions displayed with local utility "version")
OS, ker|rel, machine: Linux, 2.6.26-2-amd64, x86_64
Distribution        : Debian GNU/Linux 5.0.8 (lenny) 
bash GNU bash 3.2.39
csplit (GNU coreutils) 6.10
perl 5.10.0

-----
 Input data file data2:
Unrelated stuff
Info: App:  Version 2.0.1  Date 2013/09/01
Info: ****************************************************************
Info: Started at:         Tue Sep 10 10:30:04 2013
No discernible pattern
Info: App:  Version 2.0.1  Date 2013/09/01
Info: ****************************************************************
Info: Started at:         Tue Sep 11 10:30:03 2013

-----
 Current situation of resultant xx?? files (expecting none):
ls: cannot access xx??: No such file or directory

-----
 Results, counts of split file in characters:
181
188

-----
 Current situation, and content of files:
-rw-r--r-- 1 181 Sep 12 10:14 xx00
-rw-r--r-- 1 188 Sep 12 10:14 xx01

==> xx00 <==
Unrelated stuff
Info: App:  Version 2.0.1  Date 2013/09/01
Info: ****************************************************************
Info: Started at:         Tue Sep 10 10:30:04 2013

==> xx01 <==
No discernible pattern
Info: App:  Version 2.0.1  Date 2013/09/01
Info: ****************************************************************
Info: Started at:         Tue Sep 11 10:30:03 2013

-----
 Renaming script, sample in perl:
#!/usr/bin/env perl

# @(#) p1	Demonstrate manipulation to rename file.

# Info: Started at:         Tue Sep 10 10:30:04 2013
# 0     1       2           3   4   5  6        7
# output.log_10Sep13_10:30:04
#            5 4  7  6
# extract line "Info: started at" from file
# split into array, create string from fields 758_6 (but i-1 # array)
# close, issue rename file to output.log_<string>

use strict;
use warnings;

my ( $file, $f, @pieces, $newname );
$file = shift || die " Expected filename, got nothing.\n";
open( $f, "<", $file ) || die " Cannot open file $file\.";
while (<$f>) {
  chomp;
  next if not /Info: Started at:/;
  @pieces  = split(/\s+/);
  $newname = $pieces[5] . $pieces[4] . $pieces[7] . "_" . $pieces[6];
  close $f || die " Oops, cannot close $file\n";
  rename $file, "output_$newname" || die " Cannot rename $file\n";
  last;
}

exit(0);

-----
 Renamed files:
output_10Sep2013_10:30:04
output_11Sep2013_10:30:03

To avoid making the perl script too complex, the file loop is in the shell script. One optimization that could be done is to pull that loop into the perl script.

Best wishes ... cheers, drl

RudiC · September 12, 2013, 3:20pm

How about using this to assign to your array and then continue with your script:

arr=($(grep -B1 -n "Version " file | grep -Eo "^[0-9]+-"))

You might want to use the minus sign a IFS char.

Don_Cragun · September 12, 2013, 5:16pm

Both bash and ksh shell variable arrays have limits on the number of elements you can have in an array. If your daily logs could ever grow to more than a couple of thousand lines, you might want to consider a way that doesn't care about those limits.

The following awk script should work as long as the "Started at:" line is always the 4th line in a day's log entries:

awk '!h && $NF=="****************************************************************" {
    # This is the 1st line of a 4 line header.
    if(f) close(f);     # Close previous output file.
    h=1                 # Set header line number.
}
h { H[h]=$0 # Save header lines 1-4.
    if(h++==4) {
        # Set output file name from time/date stamp at end of 4th header line:
        #                   day    month   last 2 digits of year         time
        f = "output.log_" $(NF-2) $(NF-3) substr($NF,length($NF)-1) "_" $(NF-1)
        # Copy the headers to the output file.
        printf("%s\n%s\n%s\n%s\n", H[1], H[2], H[3], H[4]) > f
        h=0     # We are done with the headers for this log.
    }   
    next
}
{   print > f   # Copy remaining lines to the output file.
}' input

If you want to try this on a Solaris/SunOS system, use /usr/xpg4/bin/awk , /usr/xpg6/bin/awk , or nawk instead of awk .