Awk command to replace specific position characters.

pinnacle · January 31, 2012, 6:34pm

Hi,
I have a fixed width file.
The way this file works is say for example there are 30 columns in it each with different sizes say 10,5,2, etc...
If data in a field is less than the field size the rest of it is loaded with spaces.
I would like an awk command to that would replace

I have tried following but doesnt work

 
awk '{printf "%-"ABC"s\n",$0}' File> File1

Please let me know with the correct command.

---------- Post updated at 07:34 PM ---------- Previous update was at 06:35 PM ----------

I now tried this:

 
awk -F "" '{for (i=1;i<=NF;i++) if (i==96||i==97||i==98) $i="ABC"}1' OFS="" urfile

Need help with little more tweaking.

Chubler_XL · January 31, 2012, 7:11pm

How about this:

awk 'function repl(s,f,t,v)
{ return substr(s,1,f-1) sprintf("%-*s", t-f+1, v) substr(s,t+1) }
{ a=repl($0,96,98,"ABC")
  a=repl(a,99,113,"DEF")
  a=repl(a,190,198,"XYZ")
  print a
}' infile

pinnacle · January 31, 2012, 9:22pm

Thanks Chubler_XL. This works.
Sir could you please explain what we are doing in this.
I understand that we defined a function called "repl" which takes four parameters.
And then calling it 3 times.
I dont understand the following:

And also when we are replacing characters from 99 to 113 with "DEF" without trailing spaces. How does it not mess the format.

Would really appreciate if you could explain this to me.

mirni · January 31, 2012, 11:49pm

That's an awesome solution. I will try to analyze it, Chubler please correct if I mess up along the way
Let's look at the sprintf call:

sprintf("%-*s", t-f+1, v)

The format flags of sprintf mean the following:

So (t-f+1) is the width of the string v to be printed. Now the repl() calls:

function repl(s,f,t,v)
{ return substr(s,1,f-1) sprintf("%-*s", t-f+1, v) substr(s,t+1) }

takes a string s, position "from" f, position "to" t, and what to insert "v". It will only modify the string from f to t, inserting v properly padded.

a=repl($0,96,98,"ABC")

will take the whole line ($0) and replace the characters 96-98 with "ABC".

a=repl(a,99,113,"DEF")

will take a and replace chars 99-113 with "DEF" left aligned and padded.

a=repl(a,190,198,"XYZ")

will replace chars 190-198 of a with "XYZ" left-aligned and padded as needed.
So it is modifying the variable a, one step at a time, and taking the output of previous call as input for next one. The same 3 calls could be written nested as:

 a=repl(repl(repl($0,96,98,"ABC"),99,113,"DEF"),190,198,"XYZ")

But we can all agree, that the former is much more readable, debuggable and commentable.

Chubler_XL · January 31, 2012, 11:59pm

Ill try and explain this first:

sprintf("%-*s", t-f+1, v) substr(s,t+1)
s=string
f=from
t=to
v=New value
 
t-f+1 = length of new string

sprintf will left justify New value to width of (t-f+1), padded with spaces.

So repl function returns:

(chars 1 => (FROM-1)) +
padded New value +
(chars (TO + 1) => end)

Note that function originally takes $0 (input line) and stores result in 'a'.
All other calls work on 'a' and store back into 'a' again.

a = orginal string replaced at chars 96-98
a = a (result from 1) replaced at chars 99-113
a = a (result from 2) replaced at chars 190-198

-------
Edit: Nice description Mirni - I was desk checking my description while you posted, but I think you explained it more clearly.

pinnacle · February 1, 2012, 1:14am

Thanks Mirni and Chubler.

I got most of it except the following.

substr(s,1,f-1) --> line before the string that needs to be replaced

sprintf("%-*s", t-f+1, v) --> here "s" represents the entire line. t-f+1 lenght of string being replaced, v is the new string to be inserted. But where are we saying the start and end position to the sprintf function for the new string to be inserted. We are just passing the length of the new string wanted.

Here is where i need little more help. I looked at the sprint function man page but is not of much help.

substr(s,t+1) --> line after substring to be replaced.

And also we are not using any concatenate function to join various substrings. But in sql and other Datawarehouse tools we use concatenate function.

Appreciate your help on this.

mirni · February 1, 2012, 3:17am

No, 's' in the format stands for "string". It tells sprintf that the argument to be printed is a string (as opposed to %d which would expect a decimal number or %f which would expect a float).

var="MyString"
sprintf("%s", var)  #will print "MyString"

So

 sprintf("%-*s", t-f+1, v)

is printing the string stored in variable v, with the format "%-*s", t-f+1 telling sprintf to justify left (-) and pad with (t-f+1) spaces.

Space concatenates strings in awk. So

a="one" "two"

will produce "onetwo"

a=b c  #concat of variables
a = $1 $2 #concats first and second field

pinnacle · February 2, 2012, 12:29pm

Thanks Mirni!!

Need a sligh modification.

I would like the replacement to happen only for the first 1000 rows.
I know i have to add some where NR<=1000. But not sure exactly where.

 
awk 'function repl(s,f,t,v)
{ return substr(s,1,f-1) sprintf("%-*s", t-f+1, v) substr(s,t+1) }
{ a=repl($0,96,98,"ABC")
  a=repl(a,99,113,"DEF")
  a=repl(a,190,198,"XYZ")
  print a
}' infile

Help is appreciated.

Corona688 · February 2, 2012, 12:52pm

What should it do for the rest of the rows? Print them unmodified?

 
awk 'function repl(s,f,t,v)
{ return substr(s,1,f-1) sprintf("%-*s", t-f+1, v) substr(s,t+1) }
NR<=1000 { a=repl($0,96,98,"ABC")
  a=repl(a,99,113,"DEF")
  a=repl(a,190,198,"XYZ")
  print a
} NR>1000' infile

The first expression with NR tells it when to execute that entire code block. The second expression with NR implicitly tells it when to print. So when NR<=1000, the code block runs and prints a, and the NR>1000 expression does not print the line. When NR>1000, the code block is skipped, but the NR>1000 expression prints entire lines.