Search comppare replace (substr/lppad)

Showdown · May 13, 2016, 6:25pm

Hi Everyone,

I have a data file with data as below which contains millions of data and gets loaded to database using SQL loader using positional notation.

AT    0001      000000000100000000000
BE    4         000000000000030000000
DE    0055      000004000000000000000
FR    0         000000000000000110000
CZ    003       000330000000050000000

for eg: pos 7 to 10 will yield

My requirement is to lpad to 4 digits if the string 7-10 is having length < 4 chars.

the output should be like as below.

AT    0001      000000000100000000000
BE    0004      000000000000030000000
DE    0055      000004000000000000000
FR    0000      000000000000000110000
CZ    0003      000330000000050000000

Already burnt a lot of midnight oil for this and what i could able to do is the below.

awk '{print substr($0,7,4) }' /tmp/BKP_PIX_STG_MAST_200205110.dat

followed by sed which is not giving proper result.

I will really be grateful if anyone can focus some insight or guide me in this regard.

Many Thanks...

Corona688 · May 13, 2016, 6:56pm

$ cat data
AT    1      000000000100000000000
BE    4      000000000000030000000
DE    55      000004000000000000000
FR    0      000000000000000110000
CZ    3      000330000000050000000

$ awk '{ printf("%-6s%04d      %s\n", $1,$2,$3); }' < data
AT    0001      000000000100000000000
BE    0004      000000000000030000000
DE    0055      000004000000000000000
FR    0000      000000000000000110000
CZ    0003      000330000000050000000

$

Showdown · May 13, 2016, 7:05pm

Thanks Corona688.

The data file is not a space/ fixed delimited file. I have just provided the example with the format which looks delimited. But actually i need to do lpad for the character position 7-11 or it can be 181-184 which is not fixed

Anyway Thanks for your reply and I will take this idea up to achieve my requirement.

RudiC · May 14, 2016, 4:12am

How about

awk '{split (POS, P, "-"); P[2]++; printf "%s%04d%s\n", substr($0, 1, P[1]-1), substr($0, P[1], P[2]-P[1]), substr ($0, P[2])}' POS="7-10" file

Showdown · May 14, 2016, 3:05pm

Hi Rudic,

Much thanks for your help. It pretty much serve my purpose.
To be true, I couldn't understand the syntax which I am trying now to understand.

The only discrepancy here is after formatting using the code mentioned by you the next column is getting shifted by 4 spaces. Please see below.

XXXXXXX XXXXX                 MM11111    ZM    0         000000000000000000000OREGON XXXXXXX HXN  001111  01010  -->Original
XXXXXXX YYYYY                 MM11111    ZM 0000     000000000000000000000OREGON XXXXXXX HXN  001111  01010      -->After format

If you can help me with the code modification so that the total record length remains the same and next column value doesn't get shifted by 4 spaces it will be really great of you.

Thanks again.

Don_Cragun · May 14, 2016, 5:34pm

When I run RudiC's code with a file containing:

XXXXXXX XXXXX                 MM11111    ZM    0         000000000000000000000OREGON XXXXXXX HXN  001111  01010

and the parameters POS="48-51" and the name of the file containing the above text, I get the output:

XXXXXXX XXXXX                 MM11111    ZM    0000      000000000000000000000OREGON XXXXXXX HXN  001111  01010

which would seem to produce what you want in a manner similar to the example you provided in post #1 in this thread (without shifting any other data around).

Are you saying that you have changed the format of your input data such that the number in the field you are zero-filling is right justified instead of left justified as it was in your original sample input???

Aia · May 15, 2016, 1:18am

$ cat sample.txt
AT    0001      000000000100000000000
BE    4         000000000000030000000
DE    0055      000004000000000000000
FR    0         000000000000000110000
CZ    003       000330000000050000000

If the range is 7 to 10, blue 10-7, red 7-1:

$ perl -pe '$sub = substr $_, 6, 4 and $padd = sprintf "%04d", $sub and s/$sub/$padd/' sample.txt
AT    0001      000000000100000000000
BE    0004      000000000000030000000
DE    0055      000004000000000000000
FR    0000      000000000000000110000
CZ    0003      000330000000050000000

If the range is 48 to 51, blue 51-48, red 48-1:

$ perl -pe '$sub = substr $_, 47, 3 and $padd = sprintf "%04d", $sub and s/$sub/$padd/' showdown.txt
XXXXXXX XXXXX                 MM11111    ZM    0000       000000000000000000000OREGON XXXXXXX HXN  001111  01010

If the range is 181 to 184, blue 184-181, red 181-1

You can let the calculation be done by the program if the range is supplied in the form of (x,y), for example.

$ perl -pe 'BEGIN{@p=(48,51); $len=$p[1]-$p[0]; --$p[0]} $sub = substr $_, $p[0], $len and $padd = sprintf "%04d", $sub and s/$sub/$padd/' showdown.txt
XXXXXXX XXXXX                 MM11111    ZM    0000       000000000000000000000OREGON XXXXXXX HXN  001111  01010

RudiC · May 15, 2016, 2:47am

Is it possible you searched for 8 chars to be replaced by a 4 char zero padded string? Try

awk ' 
        {split (POS, P, "-")
         P[2]++
         LEN = P[2] - P[1]
         printf "%s%0*d%s\n", substr($0, 1, P[1]-1), LEN, substr($0, P[1], LEN), substr ($0, P[2])
        }
' POS="45-55" file
XXXXXXX XXXXX                 MM11111    ZM    0         000000000000000000000OREGON XXXXXXX HXN  001111  01010  -->Original
XXXXXXX XXXXX                 MM11111    ZM 00000000000  000000000000000000000OREGON XXXXXXX HXN  001111  01010  --> After

Showdown · May 16, 2016, 4:15pm

Thanks Rudic. Your code is working fine. I did some mistake in the positional notation.

Thank You Don for jumping in and assisting.

Aia - I will definitely try all the codes you have mentioned.

Many thanks to everyone for your kind help.