replace numbers in records

hello every one
I have file with following records

begin
ASX120016719
ASX190006729
ASX153406729
ASX190406759
ASX180006739
end

for each record there is ASX word then 9 digits after it (NO spaces included)
what i want is to :

1- skip ASX
2-skip first 2 digits after ASX word
3- check digits 3-5, if all zeros then ok, go to next record
4- if any digit from 3 to 5 is a number then check digit 8
if digit 8 is 1 or 2 then make digits 3-5 zeros
if digit 8 is NOT 1 or 2 then ok go to next record

According to the above records, the output should be:

begin
ASX120006719
ASX190006729
ASX150006729
ASX190406759
ASX180006739
end

i tried sed command but i could not figure the right syntax..
please how to do it

thank you

sed '
  /^ASX..000/!s/^\(.\{5\}\)\(...\)\(..[12].*\)/\1000\3/
  ' infile
3 Likes

Excellent solution Radoulov.

I take my hat off to you.:b:

Thanks Shell_Life,
hope it's working fine on the OP's machine :slight_smile:

Thanks alot Radoulov. It works on hp-ux.
however, I put small records to clear my point..since i'm confused with the sed script you applied, i'll put the exact number of digits in the record

usually the record is like:

ASX123456789340123456789123456789123456789123456789212345

ASX(9 digits)(3 digits to be checked if they are zeros or numbers)(36 digits)(1 digit to be checked if it is 1 or 2)(5 digits)

how the sed script will be?
thanks again

It would be easier if you post sample data and expected output.

1 Like

Thank you radoulov, you are right, i thought the short records would be easier for the reader .. i was wrong :o..

we receive many files daily, The longest file records we receive like this:

 
ASX201123000000010000000302011282004130001001111114320000000200000000000000000010000002300000000100000000101001143211079191
ASX201123000000010000000302011282004130001001111114320000000200000000000000000010000002300000000100000000101001143211079191
ASX201123000000010000000302011282004130001001111114320000000200000000000000000010000002300000000100000000201001143211079191
ASX201123000000010000000302011282004130001001111114320000000200000000000000000010000002300000000100000000201001143211079191
ASX201123000000000000500002011282004130001001111114320000000200000000000000000010000002300000000100000000101001143211079191
ASX201123000000000000000002011282004130001001111114320000000200000000000000000010000002300000000100000000101001143211079191
ASX201123000000000000000302011282004130001001111114320000000200000000000000000010000002300000000100000000201001143211079191
ASX201123000000010000000302011282004130001001111114320000000200000000000000000010000002300000000100000000201001143211079191
ASX201123000000000000000002011282004130001001111114320000000200000000000000000010000002300000000100000000101001143211079191
ASX201123000000000000000302011282004130001001111114320000000200000000000000000010000002300000000100000000101001143211079191
ASX201123000000001000000002011282004130001001111114320000000200000000000000000010000002300000000100000008101001143211079191
ASX201123000000010000004302011282004130001001111114320000000200000000000000000010000002300000000100000008101001143211079191

1- skip ASX
2-skip first 16 digits after ASX word
3- check digits 17-23, if all zeros then ok, go to next record
4- if any digit from 17 to 23 is a number then check digits 101 to 103
if digits 101 to 103 is 001 or 002 then make digits 17-23 zeros
if digits 101 to 103 is NOT 001 or 002 then ok go to next record

The output should be:

 
ASX201123000000010000000002011282004130001001111114320000000200000000000000000010000002300000000100000000101001143211079191
ASX201123000000010000000002011282004130001001111114320000000200000000000000000010000002300000000100000000101001143211079191
ASX201123000000010000000002011282004130001001111114320000000200000000000000000010000002300000000100000000201001143211079191
ASX201123000000010000000002011282004130001001111114320000000200000000000000000010000002300000000100000000201001143211079191
ASX201123000000000000000002011282004130001001111114320000000200000000000000000010000002300000000100000000101001143211079191
ASX201123000000000000000002011282004130001001111114320000000200000000000000000010000002300000000100000000101001143211079191
ASX201123000000000000000002011282004130001001111114320000000200000000000000000010000002300000000100000000201001143211079191
ASX201123000000010000000002011282004130001001111114320000000200000000000000000010000002300000000100000000201001143211079191
ASX201123000000000000000002011282004130001001111114320000000200000000000000000010000002300000000100000000101001143211079191
ASX201123000000000000000002011282004130001001111114320000000200000000000000000010000002300000000100000000101001143211079191
ASX201123000000001000000002011282004130001001111114320000000200000000000000000010000002300000000100000008101001143211079191
ASX201123000000010000004302011282004130001001111114320000000200000000000000000010000002300000000100000008101001143211079191

another thing is how to assign a record, from above records that meet the conditions, to a variable
ex
orgv=record has NO zeros in digits from 17 to 23 and has 001 or 002 in digits 101 to 103

the reason for the variable is to grep it from the file..then use if statement to do the action if it is true (some times we receive file has the right records that do not need to be changed..)

because the sender sends these files to many receivers, we can't fix it from the sender

I hope i explained it well :)..Thank you

Well,
you won't need the variable because this sed script doesn't touch the correct records:

sed '/^ASX.\{16\}0\{7\}/!s/^\(.\{19\}\)\(.\{7\}\)\(.\{77\}00[12].*\)/\10000000\3/' infile

Let me know if it works (hope I got the numbers right :)).

# sed 's/^\(ASX.\{16\}\)[0-9]\{7\}\(.\{77\}00[1-2].*\)/\10000000\2/' infile
ASX201123000000010000000002011282004130001001111114320000000200000000000000000010000002300000000100000000101001143211079191
ASX201123000000010000000002011282004130001001111114320000000200000000000000000010000002300000000100000000101001143211079191
ASX201123000000010000000002011282004130001001111114320000000200000000000000000010000002300000000100000000201001143211079191
ASX201123000000010000000002011282004130001001111114320000000200000000000000000010000002300000000100000000201001143211079191
ASX201123000000000000000002011282004130001001111114320000000200000000000000000010000002300000000100000000101001143211079191
ASX201123000000000000000002011282004130001001111114320000000200000000000000000010000002300000000100000000101001143211079191
ASX201123000000000000000002011282004130001001111114320000000200000000000000000010000002300000000100000000201001143211079191
ASX201123000000010000000002011282004130001001111114320000000200000000000000000010000002300000000100000000201001143211079191
ASX201123000000000000000002011282004130001001111114320000000200000000000000000010000002300000000100000000101001143211079191
ASX201123000000000000000002011282004130001001111114320000000200000000000000000010000002300000000100000000101001143211079191
ASX201123000000001000000002011282004130001001111114320000000200000000000000000010000002300000000100000008101001143211079191
ASX201123000000010000004302011282004130001001111114320000000200000000000000000010000002300000000100000008101001143211079191

Good point @ygemici,
the second grouping is unnecessary:

sed '/^ASX.\{16\}0\{7\}/!s/^\(.\{19\}\).\{7\}\(.\{77\}00[12].*\)/\10000000\2/' infile

Your code will do some redundant substitutions (0000000 to 0000000),
but I suppose that's not a problem.

And given the fixed format, I think this should suffice:

sed 's/^\(ASX.\{16\}\).\{7\}\(.\{77\}00[12].*\)/\10000000\2/' infile
1 Like

@radulov
yes, it can be converted to your command for this input :slight_smile:
regards

thanks radoulov, ygemici
I'll try the command as soon as i get on the machine and let you know (i'm sure it will work like the first one :b:)

I need the variable (or any logical way):
1-To run the sed ONLY when there is at least one record need to be changed,
2-To know if the sed has changed some records in the file or not? After running the sed command inside the script, the entry in the log file associated to the script is (Some records in $Filename has been changed) or (No records in $Filename has been changed)

You could try something like this (untested):

for f in *; do
  # count how many lines should be changed
  _count=$(
    egrep '^.{103}00[12]' "$f" | 
      egrep -v '^ASX.{16}0{7}' | 
        wc -l
        )
[ "$_count" != 0 ] && {
      sed '/^ASX.\{16\}0\{7\}/!s/^\(.\{19\}\).\{7\}\(.\{77\}00[12].*\)/\10000000\2/' "$f"
    printf '%s: records changed: %d\n' "$f" "$_count" >> logfile
    } ||
      printf '%s: no change\n' "$f" >> logfile
done    
1 Like

Thank you radoulov, the code is great but i have to change in many places.. i'll tell you about my idea which i did before and hope it works now.
long time ago when the condition was to check for a specific record (only when 0000020 and there is 001 or 002), i did like the following:

(i'm not at the machine now, so the syntax is not 100%), i'll explain the idea

orgv=ASX20112300000001000000020.\{77\}00[12]
for (each file in the Dir)
do
orgc=0
orgc=grep -E $orgv (file name)
 
if [ orgc -ge 1 ] 
then
sed .....
echo "some records in $filename has been changed" >> logfile
else
echo "No records in $filename records has been changed" >> logfile
fi
done 

it works fine ..
what you think? and if it is ok,

  • how to put the pattern for the orgv variable according to the new condition (any number in the 7 digits and there is 001 0r 002)

thank you

We'd need to see the actual code, actual input data, and actual expected result to tell if your code actually does what you think it does.

sorry for being late .. i just logged in the machine:

When the case was to check a specific record

ASX201123000000010000000202011282004130001001111114320000000200000000000000000010000002300000000100000000101001143211079191

(ONLY when 0000020 and 001 or 002, we change the 0000020 to 0000000)

The actual code was :

#!/usr/bin/sh
orgv=ASX................0000020.\{77\}00[12]
orgc=0
toproc="/REV/TO_PROCESS"
lgfile="/REV/scripts/rev.log"
for revfile3 in $toproc/REV*
do
      orgc=`grep -E "$orgv" $revfile3 |wc -l`
      if [ $orgc -ge 1 ]
      then
         sed  '/^ASX................0000020.\{77\}00[12]/s/./0/25' $revfile3 > $toproc/temp
         mv $toproc/temp $revfile3
        echo File $revfile3 has some values that have been changed >> $lgfile
    fi
    orgc=0
done

The above code is tested and works fine

NOW the case to check the records has changed..(when the 7 red digits has number in it and there is 001 or 002, change the 7 red digits to zeros)

radoulov and ygemici, kindly, gave me the sed command for the new case:

sed 's/^\(ASX.\{16\}\).\{7\}\(.\{77\}00[12].*\)/\10000000\2/' infile

which works fine,

also radoulov gave me code to test the records before using the sed command

which works great:b:

Now and out of curiosity, i want the pattern to assign it to orgv variable as i did before. what i mean is something like this:

orgv=ASX................(any number in the 7 digits).\{77\}00[12]

CAN I echo the lines numbers affected by the sed... if the sed changed lines 5,6,8 in a file, can i show these lines numbers in the log?
thank you

You can. This grep/cut pipeline will give you the affected line numbers:

egrep -vn '^ASX.{16}0{7}' infile | cut -d: -f1

Note that egrep is not standard (grep -E is).