Need help on find and replacement on specific line and position

Rajesh_us · January 16, 2014, 9:19pm

Very Very thanks don cragun ! you guys rocks like anyhting.

Akshay_Hegde · January 16, 2014, 9:54pm

Some example might help you to understand

# Replace 3rd char with star
$ echo "1234567890" | awk -F "" '{gsub(".","*",$3)}1' OFS=""
12*4567890

# Replace 5th char onwards with some string
$ echo "1234567890" | awk  '{$0=substr($0,1,5) New}1' New="tester"
12345tester

Read below one

substr(string, start, length) This returns a length-character-long substring of string, starting at character number start. The first character of a string is character number one.

Example:
substr("washington", 5, 3) returns "ing" . If length is not present, this function returns the whole suffix of string that begins at character number start.

substr("washington", 5) returns "ington" . This is also the case if length is greater than the number of characters remaining in the string, counting from character number start.

Don_Cragun · January 16, 2014, 10:34pm

akshay hegde:

Some example might help you to understand

# Replace 3rd char with star
$ echo "1234567890" | awk -F "" '{gsub(".","*",$3)}1' OFS=""
12*4567890

# Replace 5th char onwards with some string
$ echo "1234567890" | awk  '{$0=substr($0,1,5) New}1' New="tester"
12345tester

... ... ...

Note that the standards say that the results are unspecified if an empty string is assigned to FS. Some versions of awk will produce the output you showed above in your first example. The version of awk on OS/X issues the warning and output:

awk: field separator FS is empty

1234567890

A better example might be:

echo "1234567890" | awk '{print substr($0,1,2)"*"substr($0,4)}'

which should produce the output:

12*4567890

with any version of awk that conforms to the standards.

And, in your 2nd example, the comment says you're replacing the 5th character, but the substr() is copying the 1st five characters unchanged.

Akshay_Hegde · January 16, 2014, 11:02pm

Don I don't know about the OS/X, and in second example I mean to say retain 1st five char and 5th char onwards to length of line replace with some string, Sorry for bad English

Don_Cragun · January 17, 2014, 12:39am

I'm not concerned about your English. You are much better at English than I am in your native language! (When I attended college, I was allowed to use proficiency in FORTRAN and CDC assembler to meet MSU's requirement to understand a foreign language.)

I only mentioned the -F "" issue because the O/S being used by the submitter of this thread was never specified. The code you gave will work fine with gawk (or awk on a Linux system), but it is not portable to other systems since the standards explicitly state that the results are unspecified in this case. (OS/X is just one example of a system where an empty string is not accepted by awk as a field separator.)

Akshay_Hegde · January 17, 2014, 6:52am

@ Don

You are right as always ! I tested it on GNU Awk 3.1.7 , without warnings it worked there.

Rajesh_us · January 29, 2014, 2:34pm

awk  -v k3="$k3" '
    BEGIN{
      F="ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789"
      T="ZYXWVUTSRQPONMLKJIHGFEDCBAzyxwvutsrqponmlkjihgfedcba9876543210"
      for(i=1;i<=length(F);i++)N[substr(F,i,1)]=substr(T,i,1);
    }
    NR == k3 {
      out=x
      for(i=1;i<=length;i++) {
         c=substr($0,i,1)
         out=out (i>9&&(c in N)?N[c]:c)
      }
      $0=out
    }
    1
' $file

i don't have any issue with the above code. is it possible to make the above code work on single field .. say field $3 .. first it has to search a specific field ($3) . in field # 3, it has to replace from 9 th posotion to end of line

Chubler_XL · January 29, 2014, 5:08pm

This should match on field three, notice how if matching the "*" (star) character two backquotes are needed.

The star on the end is to match with the rest of the string so example is looking for any field 3 that starts with "NM1*IL*1"

k3='NM1\\*IL\\*1*'
awk -v k3="$k3" '
    BEGIN{
      F="ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789"
      T="ZYXWVUTSRQPONMLKJIHGFEDCBAzyxwvutsrqponmlkjihgfedcba9876543210"
      for(i=1;i<=length(F);i++)N[substr(F,i,1)]=substr(T,i,1);
    }
    $3 ~ k3 {
      out=x
      for(i=1;i<=length($3);i++) {
         c=substr($3,i,1)
         out=out (i>9&&(c in N)?N[c]:c)
      }
      $3=out
    }
    1
' $file

Rajesh_us · January 29, 2014, 5:50pm

chubler_xl:

This should match on field three, notice how if matching the "*" (star) character two backquotes are needed.

The star on the end is to match with the rest of the string so example is looking for any field 3 that starts with "NM1*IL*1"
k3='NM1\\*IL\\*1*'
awk -v k3="$k3" '
   BEGIN{
   F="ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789"
   T="ZYXWVUTSRQPONMLKJIHGFEDCBAzyxwvutsrqponmlkjihgfedcba9876543210"
   for(i=1;i<=length(F);i++)N[substr(F,i,1)]=substr(T,i,1);
   }
   $3 ~ k3 {
   out=x
   for(i=1;i<=length($3);i++) {
   c=substr($3,i,1)
   out=out (i>9&&(c in N)?N[c]:c)
   }
   $3=out
   }
   1
' $file

sorry i gave wrong information.. here it is i wanted.. please refer the updated one

i don't have any issue with the above code. is it possible to make the above code work on single field .. say field $3 .. first it has to search a specific field ($3) . in field # 3, it has to replace from 9 th position to end of the field ...

first i need to locate the field i want ,.. it could be field #3 or 4 or 5 or anything
then on the located field .. i need to replace from start position to end postion.. say 9 to 12 or 5 to end of that that specific field

Don_Cragun · January 29, 2014, 6:45pm

You have changed your requirements so many times that I have absolutely no idea what you want to do. Please give us a complete set of requirements.

Are we processing a single line in a file or every line in a file? If it is a single line, how do we identify which line you want to process? If some lines are not to be processed, are they copied unchanged from the input to the output, or are they removed from the output?

What constitutes a field?

Are you trying to change data from a certain point in a field to the end of that field, to the end of that line, for some number of characters in that field, or for some number of characters in that line?

Rajesh_us · January 29, 2014, 7:23pm

Sorry for inconvenience !

I will identify few lines in the file based on some predefined strings. On each identified line , I need to do manipulation on only few fields. Again within the fields, I will manipulate only on certain range of position. Yes, unprocessed input will be copied into output.

What was previous requirement

Input

NM1*IL*1*AYNCH*AILLIAM****AI*R02089540

output

NM1*IL*1*ZBMXS*ZROORZN****ZR*I97910459

After �1*� , everything is manipulated

What is current requirement

Example 1

input

NM1*IL*1*AYNCH*AILLIAM****AI*R02089540

output

NM1*IL*1*AYNCH*ZROOIAM****AI*R02080540

On fifth field from 1(A) to 4 (L) needs to be converted

Example 2

Input

NM1*IL*1*AYNCH*AILLIAM****AI*R02089540

ouput

NM1*IL*1*AYNCH*ZROORZN****AI*R02080540

On fifth field I'm converting everything.

Chubler_XL · January 29, 2014, 7:49pm

Looks like some more flexibility in the code could help you change things to get what you require.

Start with this for example 2 (Change all of field 5):

awk -F\* '
function rot(string,start,end,i,ret) {
   if(!end) end=length(string);
   if(!start) start=1;
   for(i=start;i<=length(string);i++) {
        c=substr(string,i,1)
        ret=ret (i>=start&&i<=end&&(c in N)?N[c]:c)
   }
   return ret
}
BEGIN{
    OFS=FS
    F="ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789"
    T="ZYXWVUTSRQPONMLKJIHGFEDCBAzyxwvutsrqponmlkjihgfedcba9876543210"
    for(i=1;i<=length(F);i++)N[substr(F,i,1)]=substr(T,i,1);
}

{ $5=rot($5) } 1 ' $file

Change to this for example 1 (chars 1-4 of field 5):

awk -F\* '
function rot(string,start,end,i,ret) {
   if(!end) end=length(string);
   if(!start) start=1;
   for(i=start;i<=length(string);i++) {
        c=substr(string,i,1)
        ret=ret (i>=start&&i<=end&&(c in N)?N[c]:c)
   }
   return ret
}
BEGIN{
    OFS=FS
    F="ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789"
    T="ZYXWVUTSRQPONMLKJIHGFEDCBAzyxwvutsrqponmlkjihgfedcba9876543210"
    for(i=1;i<=length(F);i++)N[substr(F,i,1)]=substr(T,i,1);
}

{ $5=rot($5,1,4) } 1 ' $file

Rajesh_us · January 29, 2014, 9:36pm

where should i include the line number ?? i need to

Something like this

 NR == k3

or

k3='NM1\\*IL\\*1*'
awk -v k3="$k3" '

and i tried to running the above command with line numbers in it ( im not sure i did correct). Here is the content from terminal

+ awk '-F*' k3=19 '
function rot(string,start,end,i,ret) NR == k3 {
   if(!end) end=length(string);
   if(!start) start=1;
   for(i=start;i<=length(string);i++) {
        c=substr(string,i,1)
        ret=ret (i>=start&&i<=end&&(c in N)?N[c]:c)
   }
   return ret
}
BEGIN{
    OFS=FS
    F="ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789"
    T="ZYXWVUTSRQPONMLKJIHGFEDCBAzyxwvutsrqponmlkjihgfedcba9876543210"
    for(i=1;i<=length(F);i++)N[substr(F,i,1)]=substr(T,i,1);
}

{ $5=rot($5,1,5) } 1 ' Xdata_6513155487964.xml-3C488ABB-22C2-4925-A6FF-33CF6A767268.dat
awk: cmd. line:1: fatal: cannot open file `

Code what i gave

awk  -F\*  k3="$k3" '
function rot(string,start,end,i,ret) NR == k3 {
   if(!end) end=length(string);
   if(!start) start=1; 
   for(i=start;i<=length(string);i++) {
        c=substr(string,i,1)
        ret=ret (i>=start&&i<=end&&(c in N)?N[c]:c)
   }
   return ret
}
BEGIN{
    OFS=FS
    F="ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789"
    T="ZYXWVUTSRQPONMLKJIHGFEDCBAzyxwvutsrqponmlkjihgfedcba9876543210"
    for(i=1;i<=length(F);i++)N[substr(F,i,1)]=substr(T,i,1);
}

{ $5=rot($5,1,5) } 1 ' $file

Chubler_XL · January 29, 2014, 10:28pm

Like this:

k3=17
awk -F\* -v k3="$k3" '
...
NR == k3 { $5=rot($5) } 1 ' $file

or this

k3='NM1\\*IL\\*1*'
awk -F\* -v k3="$k3" '
...
$0 ~ k3 { $5=rot($5,1,4) } 1' $file

Rajesh_us · January 29, 2014, 11:59pm

it works.. just thanks is not enough for u guys... for now , thanks a lot

---------- Post updated at 11:59 PM ---------- Previous update was at 10:40 PM ----------

i'm afraid to ask a question again. i'm not adding up any new requirements but i have few more cases (scenario). This will be my last question

Whatever you provided, its work fine

here i have two more scenario

Example 3

input

NM1*IL*1*AYNCH*AILLIAM****AI*R02089540

output

NM1*IL*1*AYNCH*AIOORZN****AI*R02080540

Example 4

input

NM1*IL*1*AYNCH*AILLIAM****AI*R02089540

output

NM1*IL*1*AYNCH*AIOOIAM****AI*R02080540

Don_Cragun · January 30, 2014, 12:33am

I would have thought this would be obvious by now, but:

$5=rot($5)

replaces the 5th field with every alphanumeric character (starting at the beginning of the field through the end of the field) rotated.

$5=rot($5,1,4)

replaces the 5th field with up to 4 alphanumeric characters starting in position 1 rotated.

$5=rot($5,3)

replaces the 5th field with every alphanumeric character starting in position 3 (to the end of the field) rotated (as in example 3). And,

$5=rot($5,3,2)

replaces the 5th field with up to 2 alphanumeric characters starting in position 3 rotated (as in example 4).

Rajesh_us · January 30, 2014, 8:13am

I tried it

Example3:

$5=rot($5,3)

Input:
AILLIAM
Output:
OORZN

Starting from 3 , it converts everything but first two fields got removed

Example4:

$5=rot($5,2,3)

Input:
AILLIAM
Output:
ROLIAM

first letter is missing and then it converts only 2 letter (2 and 3)

Don_Cragun · January 30, 2014, 10:57am

In Chubler_XL's function rot() , change the line:

   for(i=start;i<=length(string);i++) {

to:

   for(i=length(string);i;i--) {

Rajesh_us · January 30, 2014, 12:04pm

i changed it

$5=rot($5,2,3)

Input :

03079

output :

97960

what i'm expecting

06929

Don_Cragun · January 30, 2014, 12:57pm

I have been fighting a losing battle with the flu. I didn't realize it had affected my ability to read and write code. Please accept my humble apologies.

The arguments to Chubler_XL's rot() function are:

rot(string_to_process, starting_position, ending_position)

where starting_position and ending_position are offsets in string (with offset 1 being the 1st character in string ). If ending_position is not supplied, it defaults to the offset of the last character in string . If neither starting_position nor ending_position are supplied, rot() acts on every character in string .

The call to rot() for your example 3 is still:

$5-rot($5,3)

but the call to rot() for your example 4 should be:

$5=rot($5,2,4)

And, change my earlier (mistaken) fix to Chubler_XL's code to be:

   for(i=1;i<=length(string);i++) {

Sorry.

I'm going back to bed.