Advance search using sed/awk/perl

Hi,

I have a file with more than 50,000 lines of records and each record is 50 bytes in length.

I need to search every record in this file between positions 11-19 (9 bytes) and 32-40 (9 bytes) and in case any of the above 2 fields is alpha-numeric, i need to replace the whole 9 bytes of that field by a default numeric value(say, 888888888).

For example, say the input file looks like:

D00000000236778767878     745454545456785.7USA8762
D000000001SMF46567878     458795477876785.7RSA9763
D00000000223684589878     11254DUT7876785.7IND8762
D00000000898454667878     788987984876785.7FRA4765
D00000000214569787836     325455454576785.7ENG4568
D00000000236778767878     564878754876785.7ZIM8766
D000000009785DOT67878     66589455DOS6785.7KEN8963
D00000000236548767878     225456687876785.7PAK8761
D00000000998651233878     878965454876785.7BRA8764
D000000005423SOW67878     9685658TSB76785.7AUS8765
D000000008TUR59767878     55425TUR5976785.7ARG4669
D00000000412563767878     225635445256785.7EGP8766
D00000000779654267878     332256367876785.7GER8965

After the search and replace, the output file should look like:

D00000000236778767878     745454545456785.7USA8762
D00000000188888888878     458795477876785.7RSA9763
D00000000223684589878     1125S8888888885.7IND8762
D00000000898454667878     788987984876785.7FRA4765
D00000000214569787836     325455454576785.7ENG4568
D00000000236778767878     564878754876785.7ZIM8766
D00000000988888888878     665898888888885.7KEN8963
D00000000236548767878     225456687876785.7PAK8761
D00000000998651233878     878965454876785.7BRA8764
D00000000588888888878     968568888888885.7AUS8765
D00000000888888888878     554258888888885.7ARG4669
D00000000412563767878     225635445256785.7EGP8766
D00000000779654267878     332256367876785.7GER8965

Also, i need to have a log of all the values which I am replacing by using this script.

I tried using awk, sed and perl scripts but could not get the desired output.

Once of the example code I tried using sed but it did not work:

sed "s/^\(.\{10\}\}\)\([a-zA-Z]*\).*/\1888888888/"

Any help regarding this will be much appreciated. Thanks in advance

I do not know sed. Maybe my awk is much, but this works. variable 'd' you can place your 8's but I tested with _'s

mute@geek:~/test$ awk -v FS='' -v d=_________ 'substr($0,11,9) ~ /[A-Za-z]/ { $0=substr($0,1,10) d substr($0,20) } substr($0,32,9) ~ /[A-Za-z]/ { $0=substr($0,1,31) d substr($0,41) }1' file.txt
D00000000236778767878     745454545456785.7USA8762
D000000001_________78     458795477876785.7RSA9763
D00000000223684589878     11254_________5.7IND8762
D00000000898454667878     788987984876785.7FRA4765
D00000000214569787836     325455454576785.7ENG4568
D00000000236778767878     564878754876785.7ZIM8766
D000000009_________78     66589_________5.7KEN8963
D00000000236548767878     225456687876785.7PAK8761
D00000000998651233878     878965454876785.7BRA8764
D000000005_________78     96856_________5.7AUS8765
D000000008_________78     55425_________5.7ARG4669
D00000000412563767878     225635445256785.7EGP8766
D00000000779654267878     332256367876785.7GER8965
% RE='[A-Z]' DEFAULT=888888888
% perl -wlpe \
'@F = /(.{10})(.{9})(.{12})(.{9})(.*)/;
$F[1] =~ /'$RE'/ and $F[1] = '$DEFAULT';
$F[3] =~ /'$RE'/ and $F[3] = '$DEFAULT';
$_ = join "", @F;
' testfile
D00000000236778767878     745454545456785.7USA8762
D00000000188888888878     458795477876785.7RSA9763
D00000000223684589878     112548888888885.7IND8762
D00000000898454667878     788987984876785.7FRA4765
D00000000214569787836     325455454576785.7ENG4568
D00000000236778767878     564878754876785.7ZIM8766
D00000000988888888878     665898888888885.7KEN8963
D00000000236548767878     225456687876785.7PAK8761
D00000000998651233878     878965454876785.7BRA8764
D00000000588888888878     968568888888885.7AUS8765
D00000000888888888878     554258888888885.7ARG4669
D00000000412563767878     225635445256785.7EGP8766
D00000000779654267878     332256367876785.7GER8965

Thanks Neutronscott and Yazu for the quick response!

Really appreciate your help.

# sed 's/\(.\{10\}\)[A-Z]\{3\}[0-9]\{6\}\(.\{12\}\(.\{9\}\).\{10\}\)/\1888888888\2/
s/\(.\{10\}\)[0-9]\{3\}[A-Z]\{3\}[0-9]\{3\}\(.\{12\}\(.\{9\}\).\{10\}\)/\1888888888\2/
s/\(.\{31\}\)[A-Z]\{3\}[0-9]\{6\}\(.\{10\}\)/\1888888888\2/
s/\(.\{31\}\)[0-9]\{2,3\}[A-Z]\{3\}[0-9]\{3,4\}\(.\{10\}\)/\1888888888\2/' file >newfile ; more newfile
D00000000236778767878     745454545456785.7USA8762
D00000000188888888878     458795477876785.7RSA9763
D00000000223684589878     112548888888885.7IND8762
D00000000898454667878     788987984876785.7FRA4765
D00000000214569787836     325455454576785.7ENG4568
D00000000236778767878     564878754876785.7ZIM8766
D00000000988888888878     665898888888885.7KEN8963
D00000000236548767878     225456687876785.7PAK8761
D00000000998651233878     878965454876785.7BRA8764
D00000000588888888878     968568888888885.7AUS8765
D00000000888888888878     554258888888885.7ARG4669
D00000000412563767878     225635445256785.7EGP8766
D00000000779654267878     332256367876785.7GER8965

regards
ygemici

Alternate sed..

sed '/^[A-Z][0-9]*  *.*/!s/\(.\{10\}\)\(.\{9\}\)\(..\)  *\(.*\)$/\1*********\3     \4/ ; /^[^ ]*  *.....[0-9]\{9\}.*$/!s/\([^ ]*\)  *\(.....\)\(.\{9\}\)\(.*\)$/\1     \2*********\4/' inputfile

And alternate Perl:

$
$ # display the contents of the data file
$ cat data
D00000000236778767878     745454545456785.7USA8762
D000000001SMF46567878     458795477876785.7RSA9763
D00000000223684589878     11254DUT7876785.7IND8762
D00000000898454667878     788987984876785.7FRA4765
D00000000214569787836     325455454576785.7ENG4568
D00000000236778767878     564878754876785.7ZIM8766
D000000009785DOT67878     66589455DOS6785.7KEN8963
D00000000236548767878     225456687876785.7PAK8761
D00000000998651233878     878965454876785.7BRA8764
D000000005423SOW67878     9685658TSB76785.7AUS8765
D000000008TUR59767878     55425TUR5976785.7ARG4669
D00000000412563767878     225635445256785.7EGP8766
D00000000779654267878     332256367876785.7GER8965
$
$ # display the contents of the Perl program
$ cat -n process.pl
     1  #!perl -w
     2  $old = "data";     # the data file that we begin with
     3  $new = "data.new"; # temporary data file
     4  $log = "log";      # log file to record only those records that will be modified
     5
     6  $DEFAULT = "888888888";  # default replacement value
     7
     8  open (OLD, "<", $old) or die "Can't open $old for reading: $!";
     9  open (NEW, ">", $new) or die "Can't open $new for writing: $!";
    10  open (LOG, ">", $log) or die "Can't open $log for writing: $!";
    11  while (<OLD>) {
    12    if (substr($_,10,9) =~ !/\D+/ or substr($_,31,9) =~ !/\D+/) {
    13      # log the record, modify it, write to temp file
    14      print LOG "$.\t$_";
    15      substr($_,10,9) = substr($_,31,9) = $DEFAULT;
    16      print NEW $_;
    17    } else {
    18      # simply write to temp file
    19      print NEW $_;
    20    }
    21  }
    22  close (OLD) or die "Can't close $old: $!";
    23  close (NEW) or die "Can't close $new: $!";
    24  close (LOG) or die "Can't close $log: $!";
    25
    26  # rename NEW file to OLD file, effectively overwriting the old file
    27  rename($new, $old) or die "can't rename $new to $old: $!";
$
$ # Run the program
$ perl process.pl
$
$ # Check the modified data file
$ cat data
D00000000236778767878     745454545456785.7USA8762
D00000000188888888878     458798888888885.7RSA9763
D00000000288888888878     112548888888885.7IND8762
D00000000898454667878     788987984876785.7FRA4765
D00000000214569787836     325455454576785.7ENG4568
D00000000236778767878     564878754876785.7ZIM8766
D00000000988888888878     665898888888885.7KEN8963
D00000000236548767878     225456687876785.7PAK8761
D00000000998651233878     878965454876785.7BRA8764
D00000000588888888878     968568888888885.7AUS8765
D00000000888888888878     554258888888885.7ARG4669
D00000000412563767878     225635445256785.7EGP8766
D00000000779654267878     332256367876785.7GER8965
$
$ # Check the log file
$ cat log
2       D000000001SMF46567878     458795477876785.7RSA9763
3       D00000000223684589878     11254DUT7876785.7IND8762
7       D000000009785DOT67878     66589455DOS6785.7KEN8963
10      D000000005423SOW67878     9685658TSB76785.7AUS8765
11      D000000008TUR59767878     55425TUR5976785.7ARG4669
$
$

tyler_durden

Try this

 
 
perl -lne '$a=substr $_,10,9;$b=substr $_,31,9; s/$a/---------/g if($a!~/\d{9}/);s/$b/---------/g if($b!~/\d{9}/);print' input