How to replace all but the first 3 characters with sed?

leolson · February 6, 2015, 5:39pm

This seems like it should be an easy problem, but for some reason I am struggling with the solution.
I simply want to replace all characters after the first 3 characters with another character, preferably with sed.
Thanks in advance.

Like this, but producing the proper number of *'s:

sed "s/\(.\{3\}\).*/\1*****/"

Input:

abababab
cdcdcdcdcd
efefefefe

Output:

aba*****
cdc*******
efe******

RudiC · February 6, 2015, 5:49pm

Try

sed -E 's/(...)./\1*/; :A s/
[*][^*]/**/; tA' file2
aba*****
cdc*******
efe******

For legibility I've used the -E (= -r) option so the parentheses etc. don't need to be escaped.

Scrutinizer · February 7, 2015, 1:29am

Try:

sed -e :a -e 's/\(...\)[^*]/\1*/;ta' file

GNU sed:

sed ':a;s/\(...\)[^*]/\1*/;ta' file

--
Can it be perl?

perl -pe 's/(?<=...)./*/g' file

leolson · February 9, 2015, 1:24pm

Thanks. Although this works fine for the case I presented, I noticed that there could be an issue if one of the first 3 characters was a * to begin with.

In:

a*a*a*a*

Out:

a*******

Thanks! These appear to work just fine.

sea · February 9, 2015, 2:28pm

Not as nice, but might be easier to maintain:

while read line
do
	len=${#line}
	[ $len -gt 3 ] && len=$(( $len - 3))
	tmp="$(printf '%*s' $len|sed s,\ ,*,g)"
	echo "${line:0:3}${tmp}"
done<file1

$ cat file1
abababab
cdcdcdcdcd
efefefefe
a*bcdef

$ bash leolson
aba*****
cdc*******
efe******
a*b****

hth

Scrutinizer · February 9, 2015, 3:05pm

@sea: note that ${line:0:3} is bash syntax (or ksh93 or zsh), so the script should be called with bash , rather than sh . Even though on some systems sh is a link to bash , which will result in calling bash with the --posix option, there are many systems where this is not the case and so there the script would fail if called this way.

wisecracker · February 9, 2015, 5:08pm

Longhand using OSX 10.7.5, default bash terminal...
Hard coded as '3' for this DEMO.

Last login: Mon Feb  9 22:00:29 on ttys000
AMIGA:barrywalker~> echo 'kajhd(*&&#$%^ASDFGHJ{}][!\\|\00
> 123131,.,xv.,c.,.?><lksdlfk
> abcd
> def
> a' > /tmp/txt
AMIGA:barrywalker~> while read -r line; do pad="${line:3}"; echo $line; echo -E "${line:0:3}${pad//[' '-~]/*}"; done < /tmp/txt
kajhd(*&&#$%^ASDFGHJ{}][!\\|\00
kaj****************************
123131,.,xv.,c.,.?><lksdlfk
123************************
abcd
abc*
def
def
a
a
AMIGA:barrywalker~> _

EDIT:
Post the correct copy and paste... ;o)

disedorgue · February 9, 2015, 5:49pm

Hi,
Beware with builtin 'read' of bash, because if you specified a variable, you lost spaces that begin a line, example:

$ read -r line <<<"    foobar"
$ echo "$line"
foobar
$ read -r <<<"    foobar"
$ echo "$REPLY"
    foobar
$

Is it a bug ? I don't know
Regards.

Scrutinizer · February 9, 2015, 10:31pm

Good point. This is not a bug, it is because of the IFS internal field separator.

$ IFS= read -r line <<<"    foobar"
$ printf "%s\n" "$line" 
    foobar
$

To preserve content of a file including spaces while processing line by line in shell,

Set IFS to "" and local to the read operation
Use -r to avoid interpretation of \
Use printf , rather than echo to make sure that a particular version of echo does not interpret the content of the variable, and the output will come out right.
Prevent variable expansion from field splitting and globbing by using double quotes around it.

So:

while IFS= read -r line
do
  printf "%s\n" "$line"
done < file

Since the IFS variable is set local to the read operation, it will retain its original value after the fact..

anbu23 · February 10, 2015, 12:01am

$ cat file
abababab
cdcdcdcdcd
efefefefe
$ sed "s/./*/4g" file
aba*****
cdc*******
efe******

Don_Cragun · February 10, 2015, 12:20am

Unfortunately, the standards say the results are unspecified if you have both a number and a g flag for a sed substitute command. On some systems, you will get what you showed above. On others, you'll get something like:

sed: 1: "s/./*/4g": more than one number or 'g' in substitute flags

as is the result on OS X.

wisecracker · February 10, 2015, 2:44am

Hi guys...

I did not forget IFS="" to account for leading spaces, I just worked on the OP's post to have alpha characters starting but allowed for spaces inside any line which worked.
Yes there are spaces at the end of one line.

However with the extra line added, (note IFS is not saved in this DEMO.)

AMIGA:barrywalker~> IFS=""
AMIGA:barrywalker~> echo 'kajhd(*&&#$%^ASDFGHJ{}][!\\|\00
123131,.,xv.,c.,.?><lksdlfk
     barry
A*b*C***Klom
abnm
n n n n n
 n n n n n' > /tmp/txt
AMIGA:barrywalker~> while read -r line; do pad="${line:3}"; echo $line; echo -E "${line:0:3}${pad//[' '-~]/*}"; done < /tmp/txt
kajhd(*&&#$%^ASDFGHJ{}][!\\|\00
kaj****************************
123131,.,xv.,c.,.?><lksdlfk
123************************
     barry
   *******
A*b*C***Klom
A*b*********
abnm
abn*
n n n n n
n n******
 n n n n n   
 n **********
AMIGA:barrywalker~> _

MadeInGermany · February 10, 2015, 4:27am

With awk:

awk '{x=substr($0,N+1); gsub(".","*",x); print substr($0,1,N) x}' N=3 file

Scrutinizer · February 10, 2015, 11:37am

The combination of a number flag and a g flag is a GNU sed extension. Still, it is a nice solution, when GNU sed is available...

DGPickett · February 10, 2015, 12:15pm

I hate sed loops for one byte change per loop, so you can split the lines and then rejoin them:

sed '
  s/^.\{0,3\}/&\
/
' | sed '
  $q
  N
  P
  s/.*\n//
  s/./*/g
'|sed '
  $q
  N
  s/\n//
'

RudiC · February 10, 2015, 12:52pm

Point taken! I don't really like pipe organs, so tried a no loop no pipe solution:

sed 'h;s/\(...\).*/\1/;x;s/^...//;s/./*/g;H;x;s/\n//' file

Scrutinizer · February 10, 2015, 3:38pm

Some more awk variations:

awk -F'^...' '{a=$2; gsub(/./,"*",a); print substr($0,1,3) a}' file

GNU:

awk '{gsub(/./,"*",$2)}1' FIELDWIDTHS="3 99" OFS= file

GNU/mawk/BSD awk

awk '{for(i=4; i<=NF; i++) $i="*"}1' FS= OFS= file

Scrutinizer · February 10, 2015, 9:37pm

Did a small test and indeed, those 1 character loops are expensive.
It turns the awk solutions appear to be fastest.

$ time sed -e :a -e 's/\(...\)[^*]/\1*/;ta' file > /dev/null

real	0m0.114s
user	0m0.110s
sys	0m0.003s

$ time { sed '
  s/^.\{0,3\}/&\
/
' file | sed '
  $q
  N
  P
  s/.*\n//
  s/./*/g
'|sed '
  $q
  N
  s/\n//
' ;} > /dev/null

real	0m0.015s
user	0m0.015s
sys	0m0.006s

$ time sed 'h;s/\(...\).*/\1/;x;s/^...//;s/./*/g;H;x;s/\n//' file > /dev/null

real	0m0.018s
user	0m0.013s
sys	0m0.002s

$ time perl -pe 's/(?<=...)./*/g' file >/dev/null

real	0m0.030s
user	0m0.022s
sys	0m0.005s

$ time awk '{x=substr($0,N+1); gsub(".","*",x); print substr($0,1,N) x}' N=3 file > /dev/null

real	0m0.010s
user	0m0.007s
sys	0m0.002s

$ time gsed "s/./*/4g" file > /dev/null

real	0m0.022s
user	0m0.018s
sys	0m0.002s

time awk -F'^...' '{a=$2; gsub(/./,"*",a); print substr($0,1,3) a}' file > /dev/null

real	0m0.009s
user	0m0.006s
sys	0m0.003s

DGPickett · February 12, 2015, 10:43am

The '$...$' and '\1' feature has been known to be a bit slow relative to simpler choices.

Maybe awk could get the length of the line - 3 and substring a string of "***************" to produce the rest of the line, faster? Or do you need perl/python/ruby for that?

sea · February 12, 2015, 11:41am

Irritating,when i run my code using time i get:

+ ~/tmp $ cat ./leolson 
while read line
do
	len=${#line} ; [ $len -gt 3 ] && len=$(( $len - 3))
	tmp="$(printf '%*s' $len|sed s,\ ,*,g)"
	echo "${line:0:3}${tmp}"
done<file1

+ ~/tmp $ time bash ./leolson
aba*****
cdc*******
efe******
a*b****

real	0m0.007s
user	0m0.003s
sys	0m0.006s

I did a few runs, and felt tendency is more around 6-11 than 12+.
(had more 0.006 + 0.007 than everything else together)

Though, with MadeInGermany's 0.010 sec awk code i get 0.001

time awk '{x=substr($0,N+1); gsub(".","*",x); print substr($0,1,N) x}' N=3 file1 > /dev/null

real	0m0.001s
user	0m0.000s
sys	0m0.001s

And your perl code behaves quite irrrational:

+ ~/tmp $ time perl -pe 's/(?<=...)./*/g' file1 >/dev/null

real	0m0.119s
user	0m0.003s
sys	0m0.002s
+ ~/tmp $ time perl -pe 's/(?<=...)./*/g' file1 >/dev/null

real	0m0.006s
user	0m0.002s
sys	0m0.004s

I find the time diffrence quite immmense and confusing.
Sure, some mili-secs diffrence can happen - but by factor 19.8:1?