Combining gsub and substr in awk

giannicello · November 17, 2010, 2:23pm

I have data a.txt:

I want to reformat file to look like this:

basically with the 3rd columns having leading zeros removed.

My code a.awk:

awk '{ v=substr($0, 48,10); print substr($0,1,7)"|"substr($0,8,30)"|"gsub("0*","
",v)"|"substr($0,59,4)substr($0,64,2)substr($0,67,2)"|"}' a.txt

always returns '2' in the third column and I'm wondering why and how i can strip out the leading zeroes while awk'ing:

I'm trying to avoid doing another while read file just to make that substitution.
I would use 'bc" if it is possible.

Thanks.
Gianni

Chubler_XL · November 17, 2010, 2:37pm

gsub dosn't return a string result, it updates variable in place so do gsub on your variable before the print.

awk '{ v=substr($0, 48,10); gsub("^0*","",v);
print substr($0,1,7)"|"substr($0,8,30)"|"v"|"substr($0,59,4)substr($0,64,2)substr($0,67,2)"|"}' a.txt

anbu23 · November 17, 2010, 2:37pm

$ awk '{ v=substr($0, 48,10); gsub("^0*","",v); print substr($0,1,7)"|"substr($0,8,30)"|" v "|"substr($0,59,4)substr($0,64,2)substr($0,67,2)"|"}' file 
1234567|01234567890abcdefghijklmnopqrs|1|ccyymmdd|
1234567|01234567890abcdefghijklmnopqrs|9|ccyymmdd|
1234567|01234567890abcdefghijklmnopqrs|50|ccyymmdd|
$ awk '{ print substr($0,1,7)"|"substr($0,8,30)"|" int(substr($0, 48,10)) "|"substr($0,59,4)substr($0,64,2)substr($0,67,2)"|"}' file
1234567|01234567890abcdefghijklmnopqrs|1|ccyymmdd|
1234567|01234567890abcdefghijklmnopqrs|9|ccyymmdd|
1234567|01234567890abcdefghijklmnopqrs|50|ccyymmdd|

ctsgnb · November 17, 2010, 2:38pm

sed 's:0:|0:;s:ccyy-mm-dd0*:|:;s:P:|:;s:-::g;s:[A-Z]*$:|:' a.txt

$ cat a.txt
123456701234567890abcdefghijklmnopqrsccyy-mm-dd0000000001Pccyy-mm-ddABCDEFGH
123456701234567890abcdefghijklmnopqrsccyy-mm-dd0000000009Pccyy-mm-ddABCDEFGH
123456701234567890abcdefghijklmnopqrsccyy-mm-dd0000000050Pccyy-mm-ddABCDEFGH
$ sed 's:0:|0:;s:ccyy-mm-dd0*:|:;s:P:|:;s:-::g;s:[A-Z]*$:|:' a.txt
1234567|01234567890abcdefghijklmnopqrs|1|ccyymmdd|
1234567|01234567890abcdefghijklmnopqrs|9|ccyymmdd|
1234567|01234567890abcdefghijklmnopqrs|50|ccyymmdd|
$

vgersh99 · November 17, 2010, 2:38pm

awk '{ v=substr($0, 48,10)+0; print substr($0,1,7),substr($0,8,30),v,substr($0,59,4)substr($0,64,2)substr($0,67,2)OFS}' OFS='|' a.txt

giannicello · November 17, 2010, 2:50pm

Wow. Thank you so much everyone!! I like the +0 solution for it's simplicity.

-Gianni

Scrutinizer · November 17, 2010, 3:46pm

sed 's/\(.\{7\}\)\(.\{30\}\).*00*\(.*\)P\(....\)-\(..\)-\(..\).*/\1|\2|\3|\4\5\6|/' file

GNU sed:

sed -r 's/(.{7})(.{30}).*00*(.*)P(....)-(..)-(..).*/\1|\2|\3|\4\5\6|/' file

ctsgnb · November 17, 2010, 3:50pm

i intended to avoid \<n> use ( \1 \2 ...)

Could you run a runtime test and post the result between with and without \<n> notation ?

Scrutinizer · November 17, 2010, 4:29pm

Yes, it is good to avoid grouping, but in this case the selection depends on positional selection and therefore I don't think your solution will work with the actual input file because it bases on specific values of certain characters that I think will probably end up being different. You could make a combination of positional orientation and pattern anchoring, for example:

sed 's/.\{7\}/&|/;s/|.\{30\}/&|/;s/....-..-..0*0//;s/P/|/;s/-//g;s/.\{8\}$/|/'

ctsgnb · November 18, 2010, 3:26am

I agree, it sounds good