Delete lines from file using Unix Script

phani333 · June 2, 2010, 12:28am

Hi Experts,

I have a file in the below given format. First two lines are header and Trailer. Rest all are transaction Lines. I have to delete all other lines except first line (Header) and lines which contains 5000 in 1st column and 0 in 5th column.

Can anyone please kindly provide me with the script code. Thanks in Advance.

0000 00000 Header
0000 00000 Trailer
1000 00000 Others.
9999 00000
5000 09999 00000008499 0000999 0
5000 09999 00000008499 0000999 7
5000 09999 00000008499 0000999 0
5000 09999 00000008499 0000999 0
5000 09999 00000008499 0000999 7
4300 09999 00000008499 0000999 0
6000 09999 00000008499 0000999 0
1000 09999 00000008499 0000999 7

naree · June 2, 2010, 1:46am

Hi,
Probably you can try this one.

awk ' { if ( $0~/Header/ || $1 == 5000 || $5 == 0 ) { print } }' filename > new

mv new filename

Regards
Naree

pseudocoder · June 2, 2010, 6:37am

---------- Post updated at 12:37 ---------- Previous update was at 12:36 ----------

If your header line will not always contain the literal word Header, then the previous solution will not meet your requirement and you will want to check this sed approach:

sed q file > tmpfile
sed '/^5000.* 0$/!d' file >> tmpfile
mv tmpfile file

Scrutinizer · June 2, 2010, 7:48am

A tad shorter:

awk '/Header/ || $1==5000 && $5==0' infile

phani333 · June 11, 2010, 2:37am

Hi All,

Thank you very much for the replies. I request few changes. Please click the below link to see the sample file which needs to be modified using the script.

SampleFile on Flickr - Photo Sharing!

Requirements:

Unix script should modify the file placed in /usr/sap/xyz folder.
First and Second lines with 0000 are Header and Trailer. Unix Script should prefix H before 1st 0000 and T before second 0000.
Script should remove all lines other than which has value 5000 in 1st column and 0 in 5th column.

File Snippet Before Execution of Script:

0000 00000 00000008499 0000999 0
0000 00000 00000008499 0000999 0
1000 00000 00000008499 0000999 0
9999 00000 00000008499 0000999 0
5000 09999 00000008499 0000999 0
5000 09999 00000008499 0000999 7
5000 09999 00000008499 0000999 0
5000 09999 00000008499 0000999 0
5000 09999 00000008499 0000999 7
4300 09999 00000008499 0000999 0
6000 09999 00000008499 0000999 0
1000 09999 00000008499 0000999 7

File Snippet After Execution of Script:-

H0000 00000 00000008499 0000999 0
T0000 00000 00000008499 0000999 0
5000 09999 00000008499 0000999 0
5000 09999 00000008499 0000999 0
5000 09999 00000008499 0000999 0

rdcwayx · June 11, 2010, 2:53am

awk 'BEGIN {a=1}
      $1=="0000"&&a==1 {a++;print "H" $0;next}
      $1=="0000"&&a==2 {a=1;print "T" $0;next}
      $1=="5000" && $NF=="0" {print $0}
     ' urfile

But I don't understand how you get the 4 and 8 in the output.

H0000 00000 00000008499 0000999 4
T0000 00000 00000008499 0000999 8

phani333 · June 12, 2010, 12:57am

Hi,

Apologies. there is no 4 and 8. its mistakenly typed. My requirement is to Append H and T before line 1 and 2 and remove all other lines other than one which has 5000 and 0 in 1st and 5th column.

Thanks,
Phani Akella.

---------- Post updated 06-12-10 at 10:27 AM ---------- Previous update was 06-11-10 at 12:26 PM ----------

Hi rdcwayx,

I have modified your code and it is working fine now. Thank you very much. I have changed NF with $5.

Code:

#!/bin/sh

for file in /sapmnt/XD5/SAPPI/Pcard/Outbound/*
do

awk 'BEGIN {a=1}
$1=="0000"&&a==1 {a++;print "H" $0;next}
$1=="0000"&&a==2 {a=1;print "T" $0;next}
$1=="5000"&&$5=="0" {print $0}
' "$file" > tempfile

mv tempfile "$file"
done