Masking algorithm

n78298 · July 31, 2015, 7:23pm

I have a requirement of masking few specific fields in the UNIX file. The details are as following-

File is fixed length file with each record of 250 charater length.

2 fields needs to be masked � the positions are 21:30 and 110:120

The character by character making needs to be done which means one character in this field ( 21:30 and 110:120 ) should be replaced with some another character.

The replacement patterns should be kept in a separate file. For example

A:B
B:C

Which means A should be replaced with B , B with C and so on.

Don_Cragun · July 31, 2015, 9:33pm

Is this a homework assignment. Homework assignments must be filed in the Homework & Coursework forum and the 1st post in threads in that forum must contain a completely filled out homework template.

If it isn't homework, you need to make your requirements much clearer. A masking operation would blank out or remove a specific character or a specific set of characters; not replace one set of characters with another set of characters.

And without sample input and corresponding sample output (in CODE tags), your specification is ambiguous. If an A is found and converted to a B (as in your sample), is the output for that position supposed to be a B or should it be a C because the 2nd line in your sample input file says B should be changed to C ?

n78298 · August 2, 2015, 3:07am

This is not an assignment; this is a real time challenge. Apologies for not articulating problem in the correct manner and fully explaining the issue.
Maybe calling it a masking solution is not the right thing. My actual requirement is to replace a set of characters with another set of character. That too a character by character replacement so reverse decryption can be easily achieved as well.
As it�s character by character replacement every character will have one and only one replacement character. Hence ABCDE should convert to BCDEF where replacement algorithm is like

A : B
B : C
C : D
D : E
E : F

The first character to be found and replaced by second character.

RudiC · August 2, 2015, 5:13am

Given your separate translation file has the structure as in post#1 (no spaces), this might do what you need:

awk '
FNR==NR         {C[$1]=$2
                 next
                }

FNR==1          {MXR=split (RANGES, T1, " ")
                 for (i=1; i<=MXR; i++) {split (T1, T2, ":")
                                         S=T2[1]
                                         E=T2[2]
                                        }
                }

                {for (i=1; i<=MXR; i++)
                        {for (j=S; j<=E; j++)
                                if (C[$j]) $j=C[$j]
                        }
                }
1
' FS=":" file1 FS="" OFS="" RANGES="21:30 110:120" file2

Don_Cragun · August 2, 2015, 5:39am

If your replacement pattern file is in the format shown in post #1 or in the format shown in post #3, or if you are working on a system where setting FS to the empty string doesn't treat each input character as a separate field, you could this (although all of the field sizes and offsets are built into the code instead of being derived from an input operand):

awk -F'[[:blank:]]*:[[:blank:]]*' '
FNR == NR {
	rp[$1] = $2
	next
}
{	o = substr($0, 1, 20)
	for(i = 21; i <= 30; i++)
		o = o (((c = substr($0, i, 1)) in rp) ? rp[c] : c)
	o = o substr($0, 31, 79)
	for(i = 110; i <= 120; i++)
		o = o (((c = substr($0, i, 1)) in rp) ? rp[c] : c)
	print o substr($0, 121)
}' replacement_pattern_file data_file

If you want to try this on a Solaris/SunOS system, change awk to /usr/xpg4/bin/awk or nawk .

n78298 · August 2, 2015, 1:39pm

Thanks RudiC and Don.

The codes are working beautifully and are doing the intended translations.

My next challenge is the performance as these codes many need to run on files with 10M records. However I had a test run with 1M records and it completed in less than 8 minutes.

I will further post if I get any further challenge in this. Need to build a complete solution across files ( with multiple translation fields).