How to replace all but the first 3 characters with sed?

This seems like it should be an easy problem, but for some reason I am struggling with the solution.
I simply want to replace all characters after the first 3 characters with another character, preferably with sed.
Thanks in advance.

Like this, but producing the proper number of *'s:

sed "s/\(.\{3\}\).*/\1*****/"

Input:

abababab
cdcdcdcdcd
efefefefe

Output:

aba*****
cdc*******
efe******

Try

sed -E 's/(...)./\1*/; :A s/
[*][^*]/**/; tA' file2
aba*****
cdc*******
efe******

For legibility I've used the -E (= -r) option so the parentheses etc. don't need to be escaped.

2 Likes

Try:

sed -e :a -e 's/\(...\)[^*]/\1*/;ta' file

GNU sed:

sed ':a;s/\(...\)[^*]/\1*/;ta' file

--
Can it be perl?

perl -pe 's/(?<=...)./*/g' file
2 Likes

Thanks. Although this works fine for the case I presented, I noticed that there could be an issue if one of the first 3 characters was a * to begin with.

In:

a*a*a*a*

Out:

a*******

Thanks! These appear to work just fine.

Not as nice, but might be easier to maintain:

while read line
do
	len=${#line}
	[ $len -gt 3 ] && len=$(( $len - 3))
	tmp="$(printf '%*s' $len|sed s,\ ,*,g)"
	echo "${line:0:3}${tmp}"
done<file1
$ cat file1
abababab
cdcdcdcdcd
efefefefe
a*bcdef

$ bash leolson
aba*****
cdc*******
efe******
a*b****

hth

@sea: note that ${line:0:3} is bash syntax (or ksh93 or zsh), so the script should be called with bash , rather than sh . Even though on some systems sh is a link to bash , which will result in calling bash with the --posix option, there are many systems where this is not the case and so there the script would fail if called this way.

1 Like

Longhand using OSX 10.7.5, default bash terminal...
Hard coded as '3' for this DEMO.

Last login: Mon Feb  9 22:00:29 on ttys000
AMIGA:barrywalker~> echo 'kajhd(*&&#$%^ASDFGHJ{}][!\\|\00
> 123131,.,xv.,c.,.?><lksdlfk
> abcd
> def
> a' > /tmp/txt
AMIGA:barrywalker~> while read -r line; do pad="${line:3}"; echo $line; echo -E "${line:0:3}${pad//[' '-~]/*}"; done < /tmp/txt
kajhd(*&&#$%^ASDFGHJ{}][!\\|\00
kaj****************************
123131,.,xv.,c.,.?><lksdlfk
123************************
abcd
abc*
def
def
a
a
AMIGA:barrywalker~> _

EDIT:
Post the correct copy and paste... ;o)

Hi,
Beware with builtin 'read' of bash, because if you specified a variable, you lost spaces that begin a line, example:

$ read -r line <<<"    foobar"
$ echo "$line"
foobar
$ read -r <<<"    foobar"
$ echo "$REPLY"
    foobar
$ 

Is it a bug ? I don't know
Regards.

1 Like

Good point. This is not a bug, it is because of the IFS internal field separator.

$ IFS= read -r line <<<"    foobar"
$ printf "%s\n" "$line" 
    foobar
$

To preserve content of a file including spaces while processing line by line in shell,

  1. Set IFS to "" and local to the read operation
  2. Use -r to avoid interpretation of \
  3. Use printf , rather than echo to make sure that a particular version of echo does not interpret the content of the variable, and the output will come out right.
  4. Prevent variable expansion from field splitting and globbing by using double quotes around it.

So:

while IFS= read -r line
do
  printf "%s\n" "$line"
done < file

Since the IFS variable is set local to the read operation, it will retain its original value after the fact..

2 Likes
$ cat file
abababab
cdcdcdcdcd
efefefefe
$ sed "s/./*/4g" file
aba*****
cdc*******
efe******
1 Like

Unfortunately, the standards say the results are unspecified if you have both a number and a g flag for a sed substitute command. On some systems, you will get what you showed above. On others, you'll get something like:

sed: 1: "s/./*/4g": more than one number or 'g' in substitute flags

as is the result on OS X.

1 Like

Hi guys...

I did not forget IFS="" to account for leading spaces, I just worked on the OP's post to have alpha characters starting but allowed for spaces inside any line which worked.
Yes there are spaces at the end of one line.

However with the extra line added, (note IFS is not saved in this DEMO.)

AMIGA:barrywalker~> IFS=""
AMIGA:barrywalker~> echo 'kajhd(*&&#$%^ASDFGHJ{}][!\\|\00
123131,.,xv.,c.,.?><lksdlfk
     barry
A*b*C***Klom
abnm
n n n n n
 n n n n n' > /tmp/txt
AMIGA:barrywalker~> while read -r line; do pad="${line:3}"; echo $line; echo -E "${line:0:3}${pad//[' '-~]/*}"; done < /tmp/txt
kajhd(*&&#$%^ASDFGHJ{}][!\\|\00
kaj****************************
123131,.,xv.,c.,.?><lksdlfk
123************************
     barry
   *******
A*b*C***Klom
A*b*********
abnm
abn*
n n n n n
n n******
 n n n n n   
 n **********
AMIGA:barrywalker~> _

With awk:

awk '{x=substr($0,N+1); gsub(".","*",x); print substr($0,1,N) x}' N=3 file

The combination of a number flag and a g flag is a GNU sed extension. Still, it is a nice solution, when GNU sed is available...

I hate sed loops for one byte change per loop, so you can split the lines and then rejoin them:

sed '
  s/^.\{0,3\}/&\
/
' | sed '
  $q
  N
  P
  s/.*\n//
  s/./*/g
'|sed '
  $q
  N
  s/\n//
'
1 Like

Point taken! I don't really like pipe organs, so tried a no loop no pipe solution:

sed 'h;s/\(...\).*/\1/;x;s/^...//;s/./*/g;H;x;s/\n//' file

Some more awk variations:

awk -F'^...' '{a=$2; gsub(/./,"*",a); print substr($0,1,3) a}' file

GNU:

awk '{gsub(/./,"*",$2)}1' FIELDWIDTHS="3 99" OFS= file

GNU/mawk/BSD awk

awk '{for(i=4; i<=NF; i++) $i="*"}1' FS= OFS= file

Did a small test and indeed, those 1 character loops are expensive.
It turns the awk solutions appear to be fastest.

$ time sed -e :a -e 's/\(...\)[^*]/\1*/;ta' file > /dev/null

real	0m0.114s
user	0m0.110s
sys	0m0.003s

$ time { sed '
  s/^.\{0,3\}/&\
/
' file | sed '
  $q
  N
  P
  s/.*\n//
  s/./*/g
'|sed '
  $q
  N
  s/\n//
' ;} > /dev/null

real	0m0.015s
user	0m0.015s
sys	0m0.006s

$ time sed 'h;s/\(...\).*/\1/;x;s/^...//;s/./*/g;H;x;s/\n//' file > /dev/null

real	0m0.018s
user	0m0.013s
sys	0m0.002s

$ time perl -pe 's/(?<=...)./*/g' file >/dev/null

real	0m0.030s
user	0m0.022s
sys	0m0.005s

$ time awk '{x=substr($0,N+1); gsub(".","*",x); print substr($0,1,N) x}' N=3 file > /dev/null

real	0m0.010s
user	0m0.007s
sys	0m0.002s

$ time gsed "s/./*/4g" file > /dev/null

real	0m0.022s
user	0m0.018s
sys	0m0.002s

time awk -F'^...' '{a=$2; gsub(/./,"*",a); print substr($0,1,3) a}' file > /dev/null

real	0m0.009s
user	0m0.006s
sys	0m0.003s

The '\(...\)' and '\1' feature has been known to be a bit slow relative to simpler choices.

Maybe awk could get the length of the line - 3 and substring a string of "***************" to produce the rest of the line, faster? Or do you need perl/python/ruby for that?

Irritating,when i run my code using time i get:

+ ~/tmp $ cat ./leolson 
while read line
do
	len=${#line} ; [ $len -gt 3 ] && len=$(( $len - 3))
	tmp="$(printf '%*s' $len|sed s,\ ,*,g)"
	echo "${line:0:3}${tmp}"
done<file1

+ ~/tmp $ time bash ./leolson
aba*****
cdc*******
efe******
a*b****

real	0m0.007s
user	0m0.003s
sys	0m0.006s

I did a few runs, and felt tendency is more around 6-11 than 12+.
(had more 0.006 + 0.007 than everything else together)

Though, with MadeInGermany's 0.010 sec awk code i get 0.001 :slight_smile:

time awk '{x=substr($0,N+1); gsub(".","*",x); print substr($0,1,N) x}' N=3 file1 > /dev/null

real	0m0.001s
user	0m0.000s
sys	0m0.001s

And your perl code behaves quite irrrational:

+ ~/tmp $ time perl -pe 's/(?<=...)./*/g' file1 >/dev/null

real	0m0.119s
user	0m0.003s
sys	0m0.002s
+ ~/tmp $ time perl -pe 's/(?<=...)./*/g' file1 >/dev/null

real	0m0.006s
user	0m0.002s
sys	0m0.004s

I find the time diffrence quite immmense and confusing.
Sure, some mili-secs diffrence can happen - but by factor 19.8:1?