Need to Mask Data

ksmbabu · August 11, 2010, 1:47am

I have an requirement. There is a file which has the below contents

Unix|123|17-01-2010
....
....
....
....
and so an

now each letter has a corresponding predefined mapping letter in order to mask the original data.(for example U = A,
n=b, i=c, x=d, same like number 1=9,2=8,3=7. Also for date field it will have default date as 01-01-1999).

Like wise i will have so many rows in the file. How to mask the data

The idea is to mask the original contents. Every upper case A to Z and a to z will have corresponding letter to map,
same like 0 to 9 will have corresponding number

After masking the data the original file content will replace with the masked data.

Any idea how to go about this requirement

Thanks
Babu

Ygor · August 11, 2010, 2:30am

I already have a script which does a similar function, but with a small difference. Rather than map each letter/number to a different letter/number, it cycles though the alphabet/digits.

$ echo 'Unix|300|17-01-2010' | awk '
{
    for(i=1;i<=length;i++){
         c=substr($0,i,1)
         d=substr($0,i,10)
         if(d~/^[0-9][0-9]-[0-9][0-9]-[0-9][0-9][0-9][0-9]$/) {
            printf "01-01-1999"
            i+=9
         }
         else
         if(c~/[0-9]/) {
            pn++
            if(pn>10) pn=1
            printf substr("1234567890", pn, 1)
         }
         else if(c~/[A-Z]/) {
            pl++
            if(pl>26) pl=1
            printf substr("ABCDEFGHIJKLMNOPQRSTUVWXYZ", pl, 1)
         }
         else if(c~/[a-z]/) {
            pl++
            if(pl>26) pl=1
            printf substr("abcdefghijklmnopqrstuvwxyz", pl, 1)
         }
         else
            printf c
    }
    printf ORS
}'
Abcd|123|01-01-1999

ksmbabu · August 12, 2010, 12:25am

Thanks Yogor

---------- Post updated 08-12-10 at 09:55 AM ---------- Previous update was 08-11-10 at 01:03 PM ----------

Anybody has some idea which perfectly suits my need

rajamadhavan · August 12, 2010, 2:09am

You can probably create a map file with contents something like this....

abcdefghijklmnopqrstuvwxyz1234567890
374df2dmd94nflt934272840ndn2834m4827

And you need to run a translate using the map file..

cat infile | tr "[`sed -n 1p map`]" "[`sed -n 2p map`]"   > outfile

I am not sure if that would satisfy your requirements entirely.

ksmbabu · August 13, 2010, 5:29am

Hi Rajamadhavan

This is not my requirement .. anyway thanks. But if you have any idea which suits my need would be great help.

My Need is every upper case and lower case letter (A-Z, a-z) has predefined mask letter (ex. A=Y... a=y and for numbers 1=0)

Then the file input is

Apple|123|13-08-2010
.....
.....
....
and so on

Babu

rajamadhavan · August 13, 2010, 5:45am

Hi Babu,
The solution I provided was to create a file to define the letters in one line and the corresponding mask letter the second line. Then use this map file to encrypt the input file.

In your setup, where from you fetch the mask letters ?

-Raja

ksmbabu · August 16, 2010, 2:42am

Hi Raja

I fetch mask letter from another files.

Babu

ygemici · August 16, 2010, 10:35am

# cat infile
Unix|123|17-01-2010
UniX|124|17-01-2010
UnIx|125|17-01-2010
UNix|126|17-01-2010
UNIX|127|17-01-2010
UnIX|128|17-01-2010
unix|129|17-01-2010

# ./justdoit
Abcd|987|01-01-1999
Abcx|986|01-01-1999
Abid|985|01-01-1999
Ancd|984|01-01-1999
Anix|983|01-01-1999
Abix|982|01-01-1999
Ubcd|981|01-01-1999

# cat mask
A=a     a=A     1=9     01-01-1999
B=b     b=B     2=8
C=c     c=C     3=7
D=d     d=D     4=6
E=e     e=E     5=5
F=f     f=F     6=4
G=g     g=G     7=3
H=h     h=H     8=2
I=i     i=c     9=1
K=k     k=K
L=l     l=L
M=m     m=M
N=n     n=b
O=o     o=O
P=p     p=P
R=r     r=R
S=s     s=S
T=t     t=T
U=A     u=U
V=v     v=V
W=w     w=W
X=x     x=d
Y=y     y=Y
Z=z     z=Z

 
## justdoit ##
#!/bin/bash
 x=0
  while IFS="|" read -r name number date
   do
     cntn=$(printf "%s" "$name" | wc -c)
     cntnn=$(printf "%s" "$number" | wc -c)
     val=( "$name" "$number" )
     namenew="";numbernew=""
charstr=$(echo ${val[0]}|fold -w 1)
     for new in ${charstr}
      do
        if [[ $(echo "$new" | grep -o '[[:upper:]]') ]] ; then
            mynew=$(awk '{print $1}' mask | sed -n "/$new/p" | cut -d= -f2)
        else
            mynew=$(awk '{print $2}' mask | sed -n "/$new/p" | cut -d= -f2)
        fi
       namenew=$namenew$mynew
      done
      names[x]=$namenew
charnum=$(echo ${val[1]}|fold -w 1)
     for number in ${charnum}
      do
        mynumber=$(awk '{print $3}' mask | sed -n "s/$number=//p" )
        numbernew=$numbernew$mynumber
      done
      numbers[x]=$numbernew
      ((x++))
datenew=$(awk '{print $4}' mask)
   done<infile
 for ((i=0;i<${#numbers[@]};i++))
   do
    echo "${names}|${numbers}|$datenew"
   done

durden_tyler · August 16, 2010, 1:18pm

Since the format of the other file has not been mentioned, I'll assume one of the simplest formats.

Here's a Perl script -

$
$
$ cat f3.mask
A=z
B=y
C=x
D=w
E=v
F=u
G=t
H=s
I=r
K=q
J=p
L=o
M=n
N=m
O=l
P=k
Q=j
R=i
S=h
T=g
U=f
V=e
W=d
X=c
Y=b
Z=a
a=Z
b=Y
c=X
d=W
e=V
f=U
g=T
h=S
i=R
k=Q
j=P
l=O
m=N
n=M
o=L
p=K
q=J
r=I
s=H
t=G
u=F
v=E
w=D
x=C
y=B
z=A
0=9
1=8
2=7
3=6
4=5
5=4
6=3
7=2
8=1
9=0
$
$ cat f3.data
Unix|123|17-01-2010
UniX|124|17-01-2010
UnIx|125|17-01-2010
UNix|126|17-01-2010
UNIX|127|17-01-2010
UnIX|128|17-01-2010
unix|129|17-01-2010
$
$
$ perl -lne 'if ($ARGV eq "f3.mask") {@x=split/=/; $y{$x[0]}=$x[1]}
             else {@z=split/\|/; $z[2]="01-01-1999"; $z[0]=~s/(.)/$y{$1}/g;
                   $z[1]=~s/(.)/$y{$1}/g; print join("|",@z)}
            ' f3.mask f3.data
fMRC|876|01-01-1999
fMRc|875|01-01-1999
fMrC|874|01-01-1999
fmRC|873|01-01-1999
fmrc|872|01-01-1999
fMrc|871|01-01-1999
FMRC|870|01-01-1999
$
$
$

tyler_durden