Formatting File Using Shell Script

ataneja7 · August 9, 2012, 4:36am

Hi Team,

We have a requirement where we need to format input file using shell script by meeting the below conditions.

  1. Ignore first 549 characters of that file.

2.   After that we need to make a file of 100 characters per line, repeat it until the 3rd  condition is met.

  3. If the word �CONTRA' is found in any line, where in that line C is at character position  65 and ends at A with the character position at 70 then stop the processing, rest of data needs to be removed including the �CONTRA' line from that file.

For Ex if the input file is

  VOL1000000                               851447                                1^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  HDR1A851447S         00000000010001       12199 12201 000000                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  HDR2F0200000100                                   00                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  UHL1 12200999999    000000001 DAILY  001          0000                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  ******************^^^^^^^^^^^^^^
  5555555555555555555555555555555    5555555555555555 44444445 555AAAAAA            ZXXXXCONTRACTORS      
  5555555555555555555555555555555    5555555555555555 44444445 555AAAAAA            ZXXXXXXXXXXXXXXXXX      
  5555555555555555555555555555555    5555555555555555 44444445 555AAAAAA            ZXXXXXXXXXXXXXXXXX      
  5555555555555555555555555555555    5555555555555555 44444445 555CONTRA            ZXXXXXXXXXXXXXXXXX      
  5555555555555555555555555555555    5555555555555555 44444445 555AAAAAA            ZXXXXXXXXXXXXXXXXX      
  5555555555555555555555555555555    5555555555555555 44444445 555AAAAAA            ZXXXXXXXXXXXXXXXXX

My script should show only the records in brown color as below

  5555555555555555555555555555555    5555555555555555 44444445 555AAAAAA            ZXXXXCONTRACTORS       
  5555555555555555555555555555555    5555555555555555 44444445 555AAAAAA            ZXXXXXXXXXXXXXXXXX      
  5555555555555555555555555555555    5555555555555555 44444445 555AAAAAA            ZXXXXXXXXXXXXXXXXX

[/COLOR] Any suggestions/opinions will be highly appreciated.

Thanks,
Ajay

Franklin52 · August 9, 2012, 5:15am

Try this:

awk '$4 ~ /CONTRA$/{exit} $1 ~ /^[0-9]/' file

ataneja7 · August 9, 2012, 5:23am

I am getting output as
awk: record `60021351261367099500...' has too many fields
record number 6
Can you please elaborate what the above command will do..

I have also tried to do it but my code is having some loop holes.

RudiC · August 9, 2012, 5:30am

Try sed with extended regex, assuming the leading 549 chars are in 5 first lines which will be deleted:

sed  -r '1,5d; /^.{66}CONTRA/,$d'

Any line after and including the line containing CONTRA in col 67 (= your sample file above! Adapt the 66 if need be) will be deleted as well. I can't guarantee though the line length of 100 chars, as it does not modify the lines printed.

Franklin52 · August 9, 2012, 5:32am

Does the file have another format then you provide in your first post?
Anyway try nawk or /usr/xpg4/bin/awk on Solaris.

Explanation:

awk '$4 ~ /CONTRA$/{exit} $1 ~ /^[0-9]/' file

If the 4th field ends with "CONTRA" exit
If the 1st field start with a number print the line.

ataneja7 · August 9, 2012, 7:15am

Hi Rudic/Franklin52,

I am able to remove first 549 characters and then reformatting every line of 100 characters.

The only problem i am facing is that i am not able to delete data from the line i encounter CONTRA word where C is at character position 65 and ends at A with position 70.

ps: yes Franklin52 i have formatted input file which has a bit diffn format. from the one posted above.
eg.

VOL1000000 851447 1^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
HDR1A851447S 00000000010001 12199 12201 000000 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
HDR2F0200000100 00 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
UHL1 12200999999 000000001 DAILY 001 0000 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
******************^^^^^^^^^^^^^^
5555555555555555555555555555555 5555555555555555 44444445 555AAAAAA ZXXXXCONTRACTORS
5555555555555555555555555555555 5555555555555555 44444445 555AAAAAA ZXXXXXXXXXXXXXXXXX
5555555555555555555555555555555 5555555555555555 44444445 555AAAAAA ZXXXXXXXXXXXXXXXXX
5555555555555555555555555555555 555555555 55555 44444445 555CONTRA ZXXXXXXXXXXXXXXXXX
5555555555555555555555555555555 5555555555555555 44444445 555CONTRA ZXXXXXXXXXXXXXXXXX
5555555555555555555555555555555 5555555555555555 44444445 555AAAAAA ZXXXXXXXXXXXXXXXXX

RudiC · August 9, 2012, 7:30am

Could you please post the output of above sed command applied to the file you have given in post #1? Given that file (including 2 leading spaces per line), the sed cmd yields

  5555555555555555555555555555555    5555555555555555 44444445 555AAAAAA            ZXXXXCONTRACTORS      
  5555555555555555555555555555555    5555555555555555 44444445 555AAAAAA            ZXXXXXXXXXXXXXXXXX      
  5555555555555555555555555555555    5555555555555555 44444445 555AAAAAA            ZXXXXXXXXXXXXXXXXX

which is what you desired...? If there's no leading spaces, reduce the 66 to 64 in the cmd.
Does your sed version support the -r option (use extended regex)?

ataneja7 · August 9, 2012, 11:25am

No it does not support -r option.

---------- Post updated at 08:19 AM ---------- Previous update was at 06:41 AM ----------

Any others options..available..

---------- Post updated at 10:25 AM ---------- Previous update was at 08:19 AM ----------

Hi Team,

I am able to achieve my objective there is one problem which is coming now.

I am not able to maintain my variables value outside my loop.

Can you help me in it.

Corona688 · August 9, 2012, 11:26am

Your loop is probably behind a pipe, which puts it in an independent subshell.

Probably, anyway. I can't actually see your computer from here, so please post your code.

ataneja7 · August 9, 2012, 11:32am

My code is as below.

#!/bin/sh
# Script: EFT.sh
# Purpose: Formatting file to be processed by bank
# Revision History :
#
# Name             Date        Change
# ---------------- ----------- --------------------------------------------------
# Ajay Taneja      07-Aug-2012 Initial Version

#for accepting 549as input
INCHAR="$1"

#for readiing file
INPUT="$2"

#file for processing
OUTPUT1="$3"

#file2 for processing
OUTPUT2="$4"

#for fetching character after first 549 chars
tail +$1c $2 >$3

#for formatting data 100 char per line
fold -100 $3 > $4
echo "Line formatted"

cat /dev/null>>$3
echo "File emptied"
count=1
export count
echo "About to enter while"
while read line 
do 
        echo "Line no is $count"
        xx=`echo "$line" |cut -c65-70`
        if test "$xx" = "CONTRA"
        then
                echo $xx
                break
        fi

        count=`expr $count + 1`
done<$4

#getting the exact data
echo $count
#head -$count $4>$3
echo "Job done"

RudiC · August 9, 2012, 11:37am

Does your sed offer extended regex , maybe not through -r but another option char? Pls check the man page again. What be your sed version?
I'd like to propose an awk solution, but AFA my awk version is concerned, there is no repeat factor like .{n} or .{n,m} available.

ataneja7 · August 9, 2012, 11:42am

My Sed does not offer -r option and even it does not have extended regex command.

Any ways my objective is achieved can you help me with variables value outside a loop.

Corona688 · August 9, 2012, 11:44am

Odd, I don't see anything that'd prevent variables in the loop from being seen outside it there.

In precisely what way is it misbehaving?

RudiC · August 9, 2012, 12:03pm

OK, which variable is losing its value outside your loop? I see $count and $xx being used and modified inside the loop, and $count, $3, and $4 used afterwards.

Aside, grep with option -n (for line no.)

grep -En "^.{66}CONTRA" infile
9:  5555555555555555555555555555555    5555555555555555 44444445 555CONTRA...

will give you the line no. (9) incorporating CONTRA.

ataneja7 · August 10, 2012, 1:21am

grep -En "^.{66}CONTRA" 2.txt, command is displaying the below error.
grep: illegal option -- E
Usage: grep -hblcnsviw pattern file . . .

$count variable is loosing its value after the loop.

nixie · August 10, 2012, 2:19am

Building on Franklin52's example... I found a syntax error as well :

awk '{if (substr($0,67,6)=="CONTRA"){exit} if($1 ~ /^[0-9]/){print }}' awktest.dat

Produces

  5555555555555555555555555555555    5555555555555555 44444445 555AAAAAA            ZXXXXCONTRACTORS
  5555555555555555555555555555555    5555555555555555 44444445 555AAAAAA            ZXXXXXXXXXXXXXXXXX
  5555555555555555555555555555555    5555555555555555 44444445 555AAAAAA            ZXXXXXXXXXXXXXXXXX

with the output you provided above. But note that the column numbers 65-70 you specified do not match your example. CONTRA begins in column 67.

awk '{if (substr($0,61,6)=="CONTRA"){exit} if($1 ~ /^[0-9]/){print }}' awktest.dat

will produce

5555555555555555555555555555555 5555555555555555 44444445 555AAAAAA ZXXXXCONTRACTORS
5555555555555555555555555555555 5555555555555555 44444445 555AAAAAA ZXXXXXXXXXXXXXXXXX
5555555555555555555555555555555 5555555555555555 44444445 555AAAAAA ZXXXXXXXXXXXXXXXXX

from the second example you provided - note again CONTRA is NOT in the columns you specified, but starts in Column 61.

This

awk '{if ($4 ~ /CONTRA$/){exit} if($1 ~ /^[0-9]/){print }}' awktest.dat

will work as long as CONTRA is at the end of the 4th column. It works with both examples.

This line will cause it to break because CONTRA is in the 5th Column

5555555555555555555555555555555 555555555 55555 44444445 555CONTRA ZXXXXXXXXXXXXXXXXX

awk '{if ($4 ~ /CONTRA$/){exit} if ($5 ~ /CONTRA$/){exit} if($1 ~ /^[0-9]/){print }}' awktest.dat

works with all examples

You can send the output to a file by adding >outfile.txt to the end of the commnd

awk '{if ($4 ~ /CONTRA$/){exit} if($1 ~ /^[0-9]/){print }}' awktest.dat > outfile.txt

Hope that helps. If that doesn't solve the problem, please be a bit more specific about the actual format of the data and post some examples that I can test against.

Hope that helps.

RudiC · August 10, 2012, 3:56am

Hi ataneja7 ,
if I run the loop in your script, everything seems OK:

About to enter while
Line no is 1
Line no is 2
3
Job done

No loss of the count value. What's your output?

Too bad your system does not support extended regex at all, as it seems.