HP Unix Script to Delete the lines in a file

Hi Experts,

I have a file format as mentioned below. I would like to have unix script (HP Unix) which can:

  1. Remove first 6 and last 3 lines.
  2. Delete the lines where 3rd column having Alpha Numeric Number
  3. Delete the lines where 4th column having 0.00
  4. Calculate the sum of all the values in 4th column. If it is 0. Write in a log file "Sum is Zero". Else Rename the file as Invaild File.

Sample File Snippet

"RPT TRIALB           ABC   LIMITED"
"                     TRIAL BALANCE"
"                     FOR THE PERIOD ENDED DECEMBER 2009"
""
""
""
"A","Retail Bank1","1234",1000,0.00,738295.08,0.00
"B","Retail Bank2","5678",2000,0.00,738295.08,0.00
"C","Retail Bank3","9101",2000,0.00,738295.08,0.00
"D","Retail Bank4","A984",-2000,0.00,738295.08,0.00
"E","Retail Bank5","23215",0.00,0.00,738295.08,0.00
"","","",,,,
"","","",,,,
"END OF REPORT"

Can you show us the code you have so far, and tell us how it misbehaves?

Hi Franklin,

I am new to Unix Scripting. I am getting the code in bits and pieces. I am not sure how to place them in a single script file to achieve this.

eg.

head -n -3 file name --> will remove last 3 lines
sed '1,6d' file name --> will remove first 6 lines
#!/usr/local/bin/perl

use strict;
use warnings;

my $bankfile='bankfile';
my $bfinvalid="$bankfile" . ".invalid";
my $logfile='logfile';
my @temparr=();
my $sum=0;

open(BF,$bankfile) or die "Error opening input file $bankfile: $!\n";
my @bankdata=<BF>;
close(BF);

for (1 .. 6) {
shift(@bankdata);
}

for (1 .. 3) {
pop(@bankdata);
}

foreach my $record (@bankdata) {
chomp($record);
my @splitarr = split(/,/,$record);
push(@temparr, "$record\n") if ($splitarr[2] !~ /[a-zA-Z]/ && $splitarr[3] !~ /^0.00/);
}

foreach my $amount (@temparr) {
chomp($amount);
my @splitarr = split(/,/,$amount);
$sum += $splitarr[3];
}

if ($sum == 0) {
open(LF,">$logfile") or die "Error opening logfile $logfile: $!\n";
print LF "Bankfile $bankfile: Sum is Zero\n";
close(LF);
}
else {
rename $bankfile, $bfinvalid;
}

You might want to change this line

open(LF,">$logfile") or die "Error opening logfile $logfile: $!\n";

to

open(LF,">>$logfile") or die "Error opening logfile $logfile: $!\n";

if you want to _append_ the line "Bankfile $bankfile: Sum is Zero" to already existing logfile, instead of creating a new logfile.

Another one:

awk -F, -v n=$(wc -l < file) '
NR < 7 || $3 ~ /[A-Za-z]/ || int($4)==0 {next}
NR==(n-2){exit}
{s+=$4}
END{if(!s)print "Sum is Zero" > "Logfile"}
1' file
1 Like

Hi Frank,

Thank you very much for the script. I want to ignore the 4th condition. I am happy if script does the first 3 conditions for me. I think I should remove last 3 lines from your code to achieve this. Thank you.

awk -F, -v n=$(wc -l < file) '
NR < 7 || $3 ~ /[A-Za-z]/ || int($4)==0 {next}
NR==(n-2){exit}

{s+=$4}
END{if(!s)print "Sum is Zero" > "Logfile"}
1' file

  1. Remove first 6 and last 3 lines.
  2. Delete the lines where 3rd column having Alpha Numeric Number
  3. Delete the lines where 4th column having 0.00
  4. Calculate the sum of all the values in 4th column. If it is 0. Write in a log file "Sum is Zero". Else Rename the file as Invaild File.

Try:

awk -F, -v n=$(wc -l < file) '
NR < 7 || $3 ~ /[A-Za-z]/ || int($4)==0 {next}
NR==(n-2){exit}
1' file
1 Like

Hi Franklin,

Thank you Very much. I am able to create the file as desired. I have to execute this script as monthly job.

I will get a file with naming convention SampleFile_YYYYMM.txt, This means the file name will change every month. Say SampleFile_201006.txt etc.

I have to write a script which reads this file and performs the required changes with out changing the file name.

How can I handle this. I have modified the code to write the output into a file. But not sure how to create the out put file with the same name.

awk -F, -v n=$(wc -l < /sapmnt/XD5/SAPPI/Outbound/SampleFile_YYYYMMDD.txt) '
NR < 7 || $3 ~ /[A-Za-z]/ || int($4)==0 {next}
NR==(n-2){exit}
1' /sapmnt/XD5/SAPPI/Outbound/SampleFile_YYYYMMDD.txt > /sapmnt/XD5/SAPPI/Outbound/SampleFile_YYYYMMDD.txt

awk -F, -v n=$(wc -l < /sapmnt/XD5/SAPPI/Outbound/SampleFile_YYYYMMDD.txt) '
NR < 7 || $3 ~ /[A-Za-z]/ || int($4)==0 {next}
NR==(n-2){exit}
1' /sapmnt/XD5/SAPPI/Outbound/SampleFile_YYYYMMDD.txt > tempfile

mv tempfile /sapmnt/XD5/SAPPI/Outbound/SampleFile_YYYYMMDD.txt

Dear Franklin,

Thank you very much for all your support.

I have a small problem here. File name is not fixed here. It could be anything. In that case how can we handle.

Can we code like this?

variable = ls ( i guess with doing 'ls' we will get the name of the file in variable) and now we can use the variable in place of the file name.

You could use a loop like this, play around with it but backup your files first:

#!/bin/sh

for file in /sapmnt/XD5/SAPPI/Outbound/SampleFile_*
do
  awk -F, -v n=$(wc -l < "$file") '
  NR < 7 || $3 ~ /[A-Za-z]/ || int($4)==0 {next}
  NR==(n-2){exit}
  1' "$file" > tempfile

  mv tempfile "$file"
done

Hi,

The code is working nicely. I am facing a issues. I am getting couple of special characters in the file (assume ! and *). Can I write a line in the below code to delete these two special characters ( ! and *) from the entire file before writing it to the output file.

Code:

#!/bin/sh

for file in /sapmnt/XD5/SAPPI/Outbound/SampleFile_*
do
  awk -F, -v n=$(wc -l < "$file") '
  NR < 7 || $3 ~ /[A-Za-z]/ || int($4)==0 {next}
  NR==(n-2){exit}
  1' "$file" > tempfile

 mv tempfile "$file"
done

Try:

#!/bin/sh

for file in /sapmnt/XD5/SAPPI/Outbound/SampleFile_*
do
  awk -F, -v n=$(wc -l < "$file") '
  NR < 7 || $3 ~ /[A-Za-z]/ || int($4)==0 {next}
  NR==(n-2){exit}
  /!/ || /\*/{gsub("[!|*]","")}
  1' "$file" > tempfile

 mv tempfile "$file"
done

Dear Franklin,

I am facing a small problem. The above code is working fine...but it is also deleting the lines which is containing the values like 0.01, -0.56, 0.44 in the 4th column...My requirement is to delete only those rows whos 4th column value is 0.

I think I need to do some change in the below pasted line. Code is deleting all the lines with values 0.1, 0.01,0.56 etc...I should delete only those lines with values 0 or 0.00...Please help me.
int($4)==0

Dear Franklin,

I am facing a small problem. The above code is working fine...but it is also deleting the lines which is containing the values like 0.01, -0.56, 0.44 in the 4th column...My requirement is to delete only those rows whos 4th column value is 0.

I think I need to do some change in the below pasted line. Code is deleting all the lines with values 0.1, 0.01,0.56 etc...I should delete only those lines with values 0 or 0.00...Please help me.
int($4)==0

Use:

$4==0

instead of:

int($4)==0
1 Like

Hi Franklin,

Thank you Very much. Its working now.