Very Very thanks don cragun ! you guys rocks like anyhting.
Some example might help you to understand
# Replace 3rd char with star
$ echo "1234567890" | awk -F "" '{gsub(".","*",$3)}1' OFS=""
12*4567890
# Replace 5th char onwards with some string
$ echo "1234567890" | awk '{$0=substr($0,1,5) New}1' New="tester"
12345tester
Read below one
substr(string, start, length)
This returns a length-character-long substring of string, starting at character number start. The first character of a string is character number one.
Example:
substr("washington", 5, 3)
returns "ing"
. If length is not present, this function returns the whole suffix of string that begins at character number start.
substr("washington", 5)
returns "ington"
. This is also the case if length is greater than the number of characters remaining in the string, counting from character number start.
Note that the standards say that the results are unspecified if an empty string is assigned to FS. Some versions of awk will produce the output you showed above in your first example. The version of awk on OS/X issues the warning and output:
awk: field separator FS is empty
1234567890
A better example might be:
echo "1234567890" | awk '{print substr($0,1,2)"*"substr($0,4)}'
which should produce the output:
12*4567890
with any version of awk that conforms to the standards.
And, in your 2nd example, the comment says you're replacing the 5th character, but the substr() is copying the 1st five characters unchanged.
Don I don't know about the OS/X, and in second example I mean to say retain 1st five char and 5th char onwards to length of line replace with some string, Sorry for bad English
I'm not concerned about your English. You are much better at English than I am in your native language! (When I attended college, I was allowed to use proficiency in FORTRAN and CDC assembler to meet MSU's requirement to understand a foreign language.)
I only mentioned the -F ""
issue because the O/S being used by the submitter of this thread was never specified. The code you gave will work fine with gawk (or awk on a Linux system), but it is not portable to other systems since the standards explicitly state that the results are unspecified in this case. (OS/X is just one example of a system where an empty string is not accepted by awk
as a field separator.)
@ Don
You are right as always ! I tested it on GNU Awk 3.1.7
, without warnings it worked there.
awk -v k3="$k3" '
BEGIN{
F="ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789"
T="ZYXWVUTSRQPONMLKJIHGFEDCBAzyxwvutsrqponmlkjihgfedcba9876543210"
for(i=1;i<=length(F);i++)N[substr(F,i,1)]=substr(T,i,1);
}
NR == k3 {
out=x
for(i=1;i<=length;i++) {
c=substr($0,i,1)
out=out (i>9&&(c in N)?N[c]:c)
}
$0=out
}
1
' $file
i don't have any issue with the above code. is it possible to make the above code work on single field .. say field $3 .. first it has to search a specific field ($3) . in field # 3, it has to replace from 9 th posotion to end of line
This should match on field three, notice how if matching the "*" (star) character two backquotes are needed.
The star on the end is to match with the rest of the string so example is looking for any field 3 that starts with "NM1*IL*1"
k3='NM1\\*IL\\*1*'
awk -v k3="$k3" '
BEGIN{
F="ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789"
T="ZYXWVUTSRQPONMLKJIHGFEDCBAzyxwvutsrqponmlkjihgfedcba9876543210"
for(i=1;i<=length(F);i++)N[substr(F,i,1)]=substr(T,i,1);
}
$3 ~ k3 {
out=x
for(i=1;i<=length($3);i++) {
c=substr($3,i,1)
out=out (i>9&&(c in N)?N[c]:c)
}
$3=out
}
1
' $file
sorry i gave wrong information.. here it is i wanted.. please refer the updated one
i don't have any issue with the above code. is it possible to make the above code work on single field .. say field $3 .. first it has to search a specific field ($3) . in field # 3, it has to replace from 9 th position to end of the field ...
first i need to locate the field i want ,.. it could be field #3 or 4 or 5 or anything
then on the located field .. i need to replace from start position to end postion.. say 9 to 12 or 5 to end of that that specific field
You have changed your requirements so many times that I have absolutely no idea what you want to do. Please give us a complete set of requirements.
Are we processing a single line in a file or every line in a file? If it is a single line, how do we identify which line you want to process? If some lines are not to be processed, are they copied unchanged from the input to the output, or are they removed from the output?
What constitutes a field?
Are you trying to change data from a certain point in a field to the end of that field, to the end of that line, for some number of characters in that field, or for some number of characters in that line?
Sorry for inconvenience !
I will identify few lines in the file based on some predefined strings. On each identified line , I need to do manipulation on only few fields. Again within the fields, I will manipulate only on certain range of position. Yes, unprocessed input will be copied into output.
What was previous requirement
Input
NM1*IL*1*AYNCH*AILLIAM****AI*R02089540
output
NM1*IL*1*ZBMXS*ZROORZN****ZR*I97910459
After �1*� , everything is manipulated
What is current requirement
Example 1
input
NM1*IL*1*AYNCH*AILLIAM****AI*R02089540
output
NM1*IL*1*AYNCH*ZROOIAM****AI*R02080540
On fifth field from 1(A) to 4 (L) needs to be converted
Example 2
Input
NM1*IL*1*AYNCH*AILLIAM****AI*R02089540
ouput
NM1*IL*1*AYNCH*ZROORZN****AI*R02080540
On fifth field I'm converting everything.
Looks like some more flexibility in the code could help you change things to get what you require.
Start with this for example 2 (Change all of field 5):
awk -F\* '
function rot(string,start,end,i,ret) {
if(!end) end=length(string);
if(!start) start=1;
for(i=start;i<=length(string);i++) {
c=substr(string,i,1)
ret=ret (i>=start&&i<=end&&(c in N)?N[c]:c)
}
return ret
}
BEGIN{
OFS=FS
F="ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789"
T="ZYXWVUTSRQPONMLKJIHGFEDCBAzyxwvutsrqponmlkjihgfedcba9876543210"
for(i=1;i<=length(F);i++)N[substr(F,i,1)]=substr(T,i,1);
}
{ $5=rot($5) } 1 ' $file
Change to this for example 1 (chars 1-4 of field 5):
awk -F\* '
function rot(string,start,end,i,ret) {
if(!end) end=length(string);
if(!start) start=1;
for(i=start;i<=length(string);i++) {
c=substr(string,i,1)
ret=ret (i>=start&&i<=end&&(c in N)?N[c]:c)
}
return ret
}
BEGIN{
OFS=FS
F="ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789"
T="ZYXWVUTSRQPONMLKJIHGFEDCBAzyxwvutsrqponmlkjihgfedcba9876543210"
for(i=1;i<=length(F);i++)N[substr(F,i,1)]=substr(T,i,1);
}
{ $5=rot($5,1,4) } 1 ' $file
where should i include the line number ?? i need to
Something like this
NR == k3
or
k3='NM1\\*IL\\*1*'
awk -v k3="$k3" '
and i tried to running the above command with line numbers in it ( im not sure i did correct). Here is the content from terminal
+ awk '-F*' k3=19 '
function rot(string,start,end,i,ret) NR == k3 {
if(!end) end=length(string);
if(!start) start=1;
for(i=start;i<=length(string);i++) {
c=substr(string,i,1)
ret=ret (i>=start&&i<=end&&(c in N)?N[c]:c)
}
return ret
}
BEGIN{
OFS=FS
F="ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789"
T="ZYXWVUTSRQPONMLKJIHGFEDCBAzyxwvutsrqponmlkjihgfedcba9876543210"
for(i=1;i<=length(F);i++)N[substr(F,i,1)]=substr(T,i,1);
}
{ $5=rot($5,1,5) } 1 ' Xdata_6513155487964.xml-3C488ABB-22C2-4925-A6FF-33CF6A767268.dat
awk: cmd. line:1: fatal: cannot open file `
Code what i gave
awk -F\* k3="$k3" '
function rot(string,start,end,i,ret) NR == k3 {
if(!end) end=length(string);
if(!start) start=1;
for(i=start;i<=length(string);i++) {
c=substr(string,i,1)
ret=ret (i>=start&&i<=end&&(c in N)?N[c]:c)
}
return ret
}
BEGIN{
OFS=FS
F="ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789"
T="ZYXWVUTSRQPONMLKJIHGFEDCBAzyxwvutsrqponmlkjihgfedcba9876543210"
for(i=1;i<=length(F);i++)N[substr(F,i,1)]=substr(T,i,1);
}
{ $5=rot($5,1,5) } 1 ' $file
Like this:
k3=17
awk -F\* -v k3="$k3" '
...
NR == k3 { $5=rot($5) } 1 ' $file
or this
k3='NM1\\*IL\\*1*'
awk -F\* -v k3="$k3" '
...
$0 ~ k3 { $5=rot($5,1,4) } 1' $file
it works.. just thanks is not enough for u guys... for now , thanks a lot
---------- Post updated at 11:59 PM ---------- Previous update was at 10:40 PM ----------
i'm afraid to ask a question again. i'm not adding up any new requirements but i have few more cases (scenario). This will be my last question
Whatever you provided, its work fine
here i have two more scenario
Example 3
input
NM1*IL*1*AYNCH*AILLIAM****AI*R02089540
output
NM1*IL*1*AYNCH*AIOORZN****AI*R02080540
Example 4
input
NM1*IL*1*AYNCH*AILLIAM****AI*R02089540
output
NM1*IL*1*AYNCH*AIOOIAM****AI*R02080540
I would have thought this would be obvious by now, but:
$5=rot($5)
replaces the 5th field with every alphanumeric character (starting at the beginning of the field through the end of the field) rotated.
$5=rot($5,1,4)
replaces the 5th field with up to 4 alphanumeric characters starting in position 1 rotated.
$5=rot($5,3)
replaces the 5th field with every alphanumeric character starting in position 3 (to the end of the field) rotated (as in example 3). And,
$5=rot($5,3,2)
replaces the 5th field with up to 2 alphanumeric characters starting in position 3 rotated (as in example 4).
I tried it
Example3:
$5=rot($5,3)
Input:
AILLIAM
Output:
OORZN
Starting from 3 , it converts everything but first two fields got removed
Example4:
$5=rot($5,2,3)
Input:
AILLIAM
Output:
ROLIAM
first letter is missing and then it converts only 2 letter (2 and 3)
In Chubler_XL's function rot()
, change the line:
for(i=start;i<=length(string);i++) {
to:
for(i=length(string);i;i--) {
i changed it
$5=rot($5,2,3)
Input :
03079
output :
97960
what i'm expecting
06929
I have been fighting a losing battle with the flu. I didn't realize it had affected my ability to read and write code. Please accept my humble apologies.
The arguments to Chubler_XL's rot()
function are:
rot(string_to_process, starting_position, ending_position)
where starting_position
and ending_position
are offsets in string
(with offset 1 being the 1st character in string
). If ending_position
is not supplied, it defaults to the offset of the last character in string
. If neither starting_position
nor ending_position
are supplied, rot()
acts on every character in string
.
The call to rot()
for your example 3 is still:
$5-rot($5,3)
but the call to rot()
for your example 4 should be:
$5=rot($5,2,4)
And, change my earlier (mistaken) fix to Chubler_XL's code to be:
for(i=1;i<=length(string);i++) {
Sorry.
I'm going back to bed.