substitution of varying digits

I had a requirement in which in need to pan(*) out digits except the first six visible, followed by six *, and rest
visible of a variable(input)

ex:
Input - 123456789012345
Output - 123456345
ex:
Input - 1234567890123456
Output - 123456
3456

so i tried something like below and it worked.

if($length($(i+12))>=15)
   
  {sub(substr($(12+i),7,6),"******",$(12+i))}

But now the updated requirement is that I need only the first six and the last 4 to be visible and the
rest(will vary depending upon length of input) replaced by * for any
length greater or equal to 15.
Please Advice how to achieve the above.

Something like this:

echo "1234567890000000000012345" | sed 's/\([0-9]\{6\}\)\(.*\)\([0-9]\{4\}\)/\1(\2)\3/' | awk -F"[()]" '{gsub(".","*",$2); print }' OFS="" 

hey Panyam,

Its you Again!!, thanks a lot!! but would require more

as the following change needs to be imbedded in this code

 
ls *.txt | while read file ; do
awk -F: '/\+ABC/{for(i=0;++i<NF;){if($i~/\+ABC/&&length($(i+12))>=15){sub(substr($(12+i),7,6),"******",$(12+i))}}}1' OFS=":" $file > "$file"_encrypted
mv  "$file"_encrypted $file
done
 

The above code is the requirement if u remember(by chance) was to look for ABC in the .txt fle and replace the 12 field
with the above requirement.

perl -lne '/(\d{6})(\d+)(\d{4})$/; print $1, "*" x length $2, $3' file

$ echo "1234567890000000000012345"  |awk -F "" '{for (i=7;i<=(NF-4);i++) $i="*"}1' OFS=""
123456***************2345

---------- Post updated at 12:08 AM ---------- Previous update was at 12:01 AM ----------

provide sample of your txt file here, we give you one line solution.

BAT:0310:2009-08-0:Y4   :H:D:00003721:03103721.IFH:00138770:05767:
00000000001279'
 
EXR:CLP:912.570000'
 
STA:A:9071559:2009-08-10::Ward::Mrs'
 
DEF::531.97:531.97:310221661617::+ABC:BAL:1:N::::5:40.00:0.00:2009-08-10:CN:11627877495099621::3:N:missc :N:PH:00010833:
0001+ABC:FPT:4:N::::5:19.99:0.00:2009-08-10:CN:1162 7987 9509 9621::3:N:miss c ross:N:AI:00220600:S3IA'
 
VDI:2004-03-12:133030431725:4:M:00001912:AT:BSP:9124029676:2004-05-06:Parker:4:12:::::I:::::N::129.00:129.00
:1234567887234567678:0:155.40::6:::::+TAX:UB:6.30+TAX:XT:15.10'
 
CTR:2009-08-10:0.00:0.00:30.00:30.00:7819.00:7819.00'
 
GTR:11.50:0.00:0.00:28457.81:149449.38:21298.48:154882.82:1725.89'
TRA'

i have a txt file as above and i need to mask the middle digits of the credit card num such that only the first 6
and last 4 are visible.

The credit card number appear in 12th position in +ABC segment separted by :
( the 12th position can have other things also apart from the credit card which shouldn't be masked) the way to
identify credit card num is the field before it that is the 11 field in ABC section is having any of these values
TXE,AF,XT,TT,IT,TX,DX,TY,DT,MO,SE,CF,AXE,DF,CX,TF,DE,XF,CNE,IX,CN,SC,XTE,AX,CX
then credit card is in 12th position and needs to masked

The credit card can be of varying digits (16, 17, 19.........) and they can digits of credit card can appear
together or with space

The credit card number also appear in 26th position in VDI segment separated by :

eg
1162 7987 9509 9621
1162798795099621
1162 7987 9509 9621 1234
11627987950996211234

The output needs to be

1162 79********9621
116279******9621
1162 79*************1234
116279**********1234  

i used the above code as posted in my above post but it didn't
have the functionality of varying digits and the 11th field check of ABC section.

Please Advice how to achieve the above.

It is a seriously bad idea, not to mention unlawful in most countries to post peoples credit card details in a forum like this. I can see the data looks old but none the less you should sanitize the data first.I work for a credit card company and they do scour the web looking for posts by their employees. If they found me posting this I would be marched out the door.I am sure Mrs Ward and Miss Ross would be less than delighted to find this here.

This might be better off as script, but it will do what you ask.

perl -pe 's/\n//g; s/\x27/\n/; s/^\s+//' infile | perl -F: -lane 'print $F[26] if /^VDI/; print $1 while ( /\+ABC(?::.*?){12}(.+?):/g )' | perl -lpe 's/\s+//g' | perl -lne '/(\d{6})(\d+)(\d{4})$/; print $1, "*" x length $2, $3'

Try this modification to rdcwayx' code:

awk -F "" '{j=0; for(i=1;i<7+j;i++)if($i==" ")j++; for(;i<=(NF-4);i++)$i="*"}1' OFS="" infile

awk script above produces this output, when run against the example data:

BAT:03********************************************************767:
000000*****279'

EXR:CL*********000'

STA:A:**************************Mrs'

DEF::5**************************************************************************************************************833:
0001+A*********************************************************************************************3IA'

VDI:20**************************************************************************************************9.00
:12345*****************************************************.10'

CTR:20*******************************************.00'

GTR:11********************************************************.89'
TRA'

DEF::5**************************************************************************************************************833:
0001+A*************************************************************************************************************3IA'

Here is what the perl I posted produces:

perl -pe 's/\n//g; s/\x27/\n/; s/^\s+//' infile | perl -F: -lane 'print $F[26] if
/^VDI/; print $1 while ( /\+ABC(?::.*?){12}(.+?):/g )' | perl -lpe 's/\s+//g' | perl -lne '/(\d{6})(\d+)(\d{4})$/; print $1, "*" x length $2, $3'

116278*******9621
116279******9629
123456*********7678

I'll post a script. This is getting too silly for a one-liner.

The awker was created for the original requirement and there it produces:

1162 79********9621
116279******9621
1162 79*************1234
116279**********1234

To get this this out of the revised specs, including the additional field code restriction for field 11, similar to your Perl, I ended up with something crazy like this :

awk -F: '{gsub(/\n/,"")}1' RS="\'" infile2 |
awk -F'\+ABC' '/^ *VDI:/{print$1}NF>1{for (i=2;i<=NF;i++)print $i}' OFS='\n' |
awk -F: '{if ($27)print $27; else if ($12 ~ /^TXE$|^AF$|^XT$|^TT$|^IT$|^TX$|^DX$|^TY$|^DT$|^MO$|^SE$|^CF$|^AXE$|^DF$|^CX$|^TF$|^DE$|^XF$|^CNE$|^IX$|^CN$|^SC$|^XTE$|^AX$|^CX$/) print $13}' |
awk -F "" '{j=0; for(i=1;i<7+j;i++)if($i==" ")j++; for(;i<=(NF-4);i++)$i="*"}1' OFS=""

Output:

116278*******9621
1162 79********9621
123456*********7678

I am sure this can be further optimized :slight_smile:
Anyone? :smiley:

I created perl that produces the correct output and which scales forever, provided the requester has posted *ALL* the use cases, so awk away; but the problem IS solved --in perl....

It's solved in awk, too. y'all just have to golf it some more. really it should be a script at this point. Let's call it quits: we solved in two languages, yay us!! :slight_smile:

Thanks Unix and Perl Gurus!!

i am looking for the replacement in the input file itself such that
the script once run should replace with * in the input file, the above code is giving me separate output as the masked card number only,

i need the whole file as output with only the cc num being masked.

please advice
Thanks in Advance!!