Masking data for different file format

Hi,
I have 3 kind of files that contains date data needed to be masked. The file is like this:

File 1 (all contents in 1 line):
input:

DTM+7:201103281411:203'LOC+175+SGSIN:139:6+TERMINATOR......'DTM+132:201103281413:203'LOC....

output:

DTM+7:'''''''''''':203'LOC+175+SGSIN:139:6+TERMINATOR......'DTM+132:'''''''''''':203'LOC....

File 2( contents devided in many lines):
input:

EXIT201103281044IAAU 3680272 4363400018000DD                 FIL  IL  LEA PERDANA      006     006         MANLEE   FD LD RD DD                    N
 
DISC201103281101TCKU 9672645 45G1400022500DD                 FIL  RC      VIRO BHUM        S079    S079                                             0300312TCNTXG                     NN N

output:

EXIT''''''''1044IAAU 3680272 4363400018000DD                 FIL  IL  LEA PERDANA      006     006         MANLEE   FD LD RD DD                    N
 
DISC''''''''1101TCKU 9672645 45G1400022500DD                 FIL  RC      VIRO BHUM        S079    S079                                             0300312TCNTXG                     NN N

File 3:
input: date to be masked is near to HHDR

HHDR   01010020110208000004NYK VERANICD     266                                                    2011020704100020110207181500                                                                                                                                                                                                                                                                                 D              NYKU 5629211                                                                                              F22521  NY   NY      N                                                                                                                          N                                                 NZLYT             NZZ063854            4510LCI                                       C                                                                                                                                                                                                                                                                                                                                                                                                               HHDR   01010020110208000004ARUNIRICKMER     008W                                                   2011020717050020110208010000

output:

HHDR   010100''''''''000004NYK VERANICD     266                                                    2011020704100020110207181500                                                                                                                                                                                                                                                                                 D              NYKU 5629211                                                                                              F22521  NY   NY      N                                                                                                                          N                                                 NZLYT             NZZ063854            4510LCI                                       C                                                                                                                                                                                                                                                                                                                                                                                                               HHDR   010100''''''''000004ARUNIRICKMER     008W                                                   2011020717050020110208010000

I have to find a generic solution that can mask those data no matter what kind of file(among those 3 kind) is put in. It's not hard to handle each kind of file but when it comes to 3 files, I'm desperated.

sed '
/^EXIT/s/^\(....\).\{8\}/\1########/
/^DISC/s/^\(....\).\{8\}/\1########/
/DTM/s/\(DTM[^:]*:\).\{12\}/\1############/g
/HHDR/s/\(HHDR[^0-9]*......\).\{8\}/\1########/g
' infile >outfile

or

sed '
/^EXIT/s/^\(....\)......../\1########/
/^DISC/s/^\(....\)......../\1########/
/DTM/s/\(DTM[^:]*:\)............/\1############/g
/HHDR/s/\(HHDR[^0-9]*......\)......../\1########/g
' infile >outfile

You can still ... | tr \# \' >outfile if you want it maskerade with simple quote instead of hash

sed '
/^EXIT/s/^\(....\)......../\1########/
/^DISC/s/^\(....\)......../\1########/
/DTM/s/\(DTM[^:]*:\)............/\1############/g
/HHDR/s/\(HHDR[^0-9]*......\)......../\1########/g
' infile | tr \# \' >outfile
1 Like

Hi,

Using 'perl':

$ cat script.pl
use strict;                                                                                                                                                                         
use warnings;                                                                                                                                                                       
                                                                                                                                                                                    
@ARGV || die "Usage: perl $0 file1 file2 ...\n";                                                                                                                                    
                                                                                                                                                                                    
while ( my $file = shift @ARGV ) {                                                                                                                                                  
    # Open files. In failure, warn the problem and read next file.                                                                                                                  
    open my $ifh, "<", $file or do { warn "Cannot open $file for reading: $!\n"; next };                                                                                            
    open my $ofh, ">", $file . ".out" or do { warn "Cannot open $file.out for writing: $!\n"; next };                                                                               
                                                                                                                                                                                    
    # Check type of file.                                                                                                                                                           
    while ( <$ifh> ) {                                                                                                                                                              
        # File 1.                                                                                                                                                                   
        if ( /^DTM/ ) {                                                                                                                                                             
            s/(DTM\+\d*:)\d{12}/$1 . ("'" x 12)/eg;                                                                                                                                 
        # File 2.                                                                                                                                                                   
        } elsif ( /^(?:EXIT|DISC)/ ) {                                                                                                                                              
            s/^(EXIT|DISC)\d{8}/$1 . ("'" x 8)/e;                                                                                                                                   
        # File 3.                                                                                                                                                                   
        } elsif ( /^HHDR/ ) {                                                                                                                                                       
            s/(HHDR\s+\d{6})\d{8}/$1 . ("'" x 8)/eg;                                                                                                                                
        }                                                                                                                                                                           
                                                                                                                                                                                    
        print $ofh $_;                                                                                                                                                              
    }                                                                                                                                                                               
                                                                                                                                                                                    
    close $ifh or warn "Error found closing $file: $!\n";                                                                                                                           
    close $ofh or warn "Error found closing $file.out: $!\n";                                                                                                                       
} 
$ perl script
Usage: perl script.pl file1 file2 ...
$ perl script.pl file1 file2 file3
(Output files will be appended with '.out' -> file1.out, file2.out and file3.out in this sample).

Regards,
Birei

Thanks ctsgnb and birei!
I will try both your solutions.
Btw, I'm not really understand what these codes doing (in birei's post):

# File 1.                                                                                                                                                                     
        if ( /^DTM/ ) {                                                                                                                                                               
            s/(DTM\+\d*:)\d{12}/$1  . ("'" x 12)/eg;                                                                                                                                   
        # File 2.                                                                                                                                                                     
        } elsif ( /^(?:EXIT|DISC)/ ) {                                                                                                                                                
            s/^(EXIT|DISC)\d{8}/$1 . ("'" x 8)/e;

What is 'eg' and 'e'?
And what 's/' means?
Notice that there is many 'DTM' in the 1-line file, is the above code can cater for all the 'DTM' inside?
Regards.

Hi,

I paste some of the 'perl' help, which I'm sure it explains much better than me.

Regards,
Birei

1 Like