Hi Everyone,
I have a data file with data as below which contains millions of data and gets loaded to database using SQL loader using positional notation.
AT 0001 000000000100000000000
BE 4 000000000000030000000
DE 0055 000004000000000000000
FR 0 000000000000000110000
CZ 003 000330000000050000000
for eg: pos 7 to 10 will yield
0001
4
0055
0
003
My requirement is to lpad to 4 digits if the string 7-10 is having length < 4 chars.
the output should be like as below.
AT 0001 000000000100000000000
BE 0004 000000000000030000000
DE 0055 000004000000000000000
FR 0000 000000000000000110000
CZ 0003 000330000000050000000
Already burnt a lot of midnight oil for this and what i could able to do is the below.
awk '{print substr($0,7,4) }' /tmp/BKP_PIX_STG_MAST_200205110.dat
followed by sed which is not giving proper result.
I will really be grateful if anyone can focus some insight or guide me in this regard.
Many Thanks...
$ cat data
AT 1 000000000100000000000
BE 4 000000000000030000000
DE 55 000004000000000000000
FR 0 000000000000000110000
CZ 3 000330000000050000000
$ awk '{ printf("%-6s%04d %s\n", $1,$2,$3); }' < data
AT 0001 000000000100000000000
BE 0004 000000000000030000000
DE 0055 000004000000000000000
FR 0000 000000000000000110000
CZ 0003 000330000000050000000
$
1 Like
Thanks Corona688.
The data file is not a space/ fixed delimited file. I have just provided the example with the format which looks delimited. But actually i need to do lpad for the character position 7-11 or it can be 181-184 which is not fixed
Anyway Thanks for your reply and I will take this idea up to achieve my requirement.
RudiC
May 14, 2016, 4:12am
4
How about
awk '{split (POS, P, "-"); P[2]++; printf "%s%04d%s\n", substr($0, 1, P[1]-1), substr($0, P[1], P[2]-P[1]), substr ($0, P[2])}' POS="7-10" file
1 Like
Hi Rudic,
Much thanks for your help. It pretty much serve my purpose.
To be true, I couldn't understand the syntax which I am trying now to understand.
The only discrepancy here is after formatting using the code mentioned by you the next column is getting shifted by 4 spaces. Please see below.
XXXXXXX XXXXX MM11111 ZM 0 000000000000000000000OREGON XXXXXXX HXN 001111 01010 -->Original
XXXXXXX YYYYY MM11111 ZM 0000 000000000000000000000OREGON XXXXXXX HXN 001111 01010 -->After format
If you can help me with the code modification so that the total record length remains the same and next column value doesn't get shifted by 4 spaces it will be really great of you.
Thanks again.
When I run RudiC's code with a file containing:
XXXXXXX XXXXX MM11111 ZM 0 000000000000000000000OREGON XXXXXXX HXN 001111 01010
and the parameters POS="48-51"
and the name of the file containing the above text, I get the output:
XXXXXXX XXXXX MM11111 ZM 0000 000000000000000000000OREGON XXXXXXX HXN 001111 01010
which would seem to produce what you want in a manner similar to the example you provided in post #1 in this thread (without shifting any other data around).
Are you saying that you have changed the format of your input data such that the number in the field you are zero-filling is right justified instead of left justified as it was in your original sample input???
1 Like
Aia
May 15, 2016, 1:18am
7
$ cat sample.txt
AT 0001 000000000100000000000
BE 4 000000000000030000000
DE 0055 000004000000000000000
FR 0 000000000000000110000
CZ 003 000330000000050000000
If the range is 7 to 10, blue 10-7, red 7-1:
$ perl -pe '$sub = substr $_, 6, 4 and $padd = sprintf "%04d", $sub and s/$sub/$padd/' sample.txt
AT 0001 000000000100000000000
BE 0004 000000000000030000000
DE 0055 000004000000000000000
FR 0000 000000000000000110000
CZ 0003 000330000000050000000
If the range is 48 to 51, blue 51-48, red 48-1:
$ perl -pe '$sub = substr $_, 47, 3 and $padd = sprintf "%04d", $sub and s/$sub/$padd/' showdown.txt
XXXXXXX XXXXX MM11111 ZM 0000 000000000000000000000OREGON XXXXXXX HXN 001111 01010
If the range is 181 to 184, blue 184-181, red 181-1
You can let the calculation be done by the program if the range is supplied in the form of (x,y), for example.
$ perl -pe 'BEGIN{@p=(48,51); $len=$p[1]-$p[0]; --$p[0]} $sub = substr $_, $p[0], $len and $padd = sprintf "%04d", $sub and s/$sub/$padd/' showdown.txt
XXXXXXX XXXXX MM11111 ZM 0000 000000000000000000000OREGON XXXXXXX HXN 001111 01010
1 Like
RudiC
May 15, 2016, 2:47am
8
Is it possible you searched for 8 chars to be replaced by a 4 char zero padded string? Try
awk '
{split (POS, P, "-")
P[2]++
LEN = P[2] - P[1]
printf "%s%0*d%s\n", substr($0, 1, P[1]-1), LEN, substr($0, P[1], LEN), substr ($0, P[2])
}
' POS="45-55" file
XXXXXXX XXXXX MM11111 ZM 0 000000000000000000000OREGON XXXXXXX HXN 001111 01010 -->Original
XXXXXXX XXXXX MM11111 ZM 00000000000 000000000000000000000OREGON XXXXXXX HXN 001111 01010 --> After
1 Like
Thanks Rudic. Your code is working fine. I did some mistake in the positional notation.
Thank You Don for jumping in and assisting.
Aia - I will definitely try all the codes you have mentioned.
Many thanks to everyone for your kind help.