Input file
>Read_1
XXXXXXXXXXSDFXXXXXDS (condition 1: After the last "X" per line, if the distance is less than or equal to 3 letter, replace those not "X" letter with "X")
TREXXXXXXXSDFXXXXXDS (condition 2: Before the first "X" per line, if the distance is less than or equal to 3 letter, replace those not "X" letter with "X")
.
.
Desired output:
>Read_1
XXXXXXXXXXSDFXXXXXXX
XXXXXXXXXXSDFXXXXXXX
.
.
I got try using this one to solve the condition 1 problem. But it is not worked:
perl -pe 's/X[^X]{2}$/XXX/g' input
Thanks for any advice.
$
$
$ cat f25
>Read_1
XXXXXXXXXXSDFXXXXXDS
TREXXXXXXXSDFXXXXXDS
XXXXXXXXXXABCXXXXXXW
AQXXXXXXXXPQRXXXXXXX
XXXXXXXXXXDEFXXXXXXX
$
$
$ perl -lne 'if (/^([^X]{1,3})(X.*)/){($a,$b)=($1,$2); $a=~s/./X/g; print "$a$b"}
elsif (/^(.*X)([^X]{1,3})$/){($a,$b)=($1,$2); $b=~s/./X/g; print "$a$b"}
else {print}
' f25
>Read_1
XXXXXXXXXXSDFXXXXXXX
XXXXXXXXXXSDFXXXXXDS
XXXXXXXXXXABCXXXXXXX
XXXXXXXXXXPQRXXXXXXX
XXXXXXXXXXDEFXXXXXXX
$
$
tyler_durden
With sed..
sed -e 's/\(^X.*\)XDS$/\1XXX/g' -e 's/^TREX\(.*\)/XXXX\1/g' inputfile
Hi durden_tyler,
Thanks for your reply.
I just edit a little bit of my previous post due to my small mistakes.
Do you have any idea to archive it?
As long as the letter (less or equal to 3 letter ) before the first "X" and last "X" is not "X".
I will replace those letter with "X"
Really thanks again for your help.
---------- Post updated at 01:35 AM ---------- Previous update was at 01:29 AM ----------
Hi michaelrozar17,
Do you have any idea or suggestion to achieve my goal?
I got edit a little bit of my previous post.
Thanks first for your advice
to my understanding you need TREXXXXXXXSDFXXXXXDS to be replaced with XXXXXXXXXXSDFXXXXXXX
sed -e 's/\(^X.*\)XDS$/\1XXX/g' -e 's/^TREX\(.*\)XDS/XXXX\1XXX/g' inputfile
Hi michaelrozar17,
My input file is a long list of data and "Read_1" just part of it.
I just not sure how to archive it automatic if my input data is a long list of data
Quite confused now. Wots your requirement? Do you need to replace those *XXX* or to archive? If to archive - pls elaborate wot "archive" you mention here..means.
Sorry for confusing you
As long as the letter (less or equal to 3 letter ) before the first "X" and after the last "X" is not "X".
I will replace those letter with "X"
# cat infile
XXXXXXXXXXSDFXXXXXDS
TREXXXXXXXSDFXXXXXDS
# ./justdoit infile
XXXXXXXXXXSDFXXXXXXX
XXXXXXXXXXSDFXXXXXXX
## justdoit ##
#!/bin/bash
rm -f tmpfileX
while read -r l ; do
x=( $(echo $( echo $l |fold -w1 )) )
xr=( $(echo $( echo $l |fold -w1 )|rev) )
c=0
for i in ${x[@]} ; do
if [ "$i" != "X" ] ; then
((c++))
else
fdst=$c ;c=0;break
fi
done
for i in ${xr[@]} ; do
if [ "$i" != "X" ] ; then
((c++))
else
ltdst=$c ;ldst=$(( ${#x[@]} - $(( $ltdst -1 )) ));break
fi
done
dst=$(( $ldst - $fdst ))
if [ $dst -gt 3 ] ; then
for i in $(seq 0 $(( $fdst - 1 )) )
do
x[$i]=X
done
for i in $(seq $(( $ldst -1 )) $(( ${#x[@]} - 1 )) )
do
x[$i]=X
done
fi
echo ${x[@]}|sed 's/ //g' >>tmpfileX
done<"$1"
more tmpfileX
Not sure what you mean by "idea to archive it" - I don't see anything about archiving in your post. You probably mean "achieve it".
In any case, the script that follows takes care of the case where a line could have a non-X letter on each end.
$
$
$ cat f25
>Read_1
XXXXXXXXXXABCXXXXXXS
XXXXXXXXXXABCXXXXXRS
XXXXXXXXXXABCXXXXQRS
XXXXXXXXXXABCXXXPQRS
PXXXXXXXXXABCXXXXXXX
PQXXXXXXXXABCXXXXXXX
PQRXXXXXXXABCXXXXXXX
PQRSXXXXXXABCXXXXXXX
PXXXXXXXXXABCXXXXXXS
PXXXXXXXXXABCXXXXXRS
PXXXXXXXXXABCXXXXQRS
PXXXXXXXXXABCXXXPQRS
PQXXXXXXXXABCXXXXXXS
PQXXXXXXXXABCXXXXXRS
PQXXXXXXXXABCXXXXQRS
PQXXXXXXXXABCXXXPQRS
PQRXXXXXXXABCXXXXXXS
PQRXXXXXXXABCXXXXXRS
PQRXXXXXXXABCXXXXQRS
PQRXXXXXXXABCXXXPQRS
PQRSXXXXXXABCXXXXXXS
PQRSXXXXXXABCXXXXXRS
PQRSXXXXXXABCXXXXQRS
PQRSXXXXXXABCXXXPQRS
XXXXXXXXXXABCXXXXXXX
$
$
$ perl -lne 'if (/^([^X]{1,3})(X.*X)([^X]{1,3})$/) {($a,$b,$c)=($1,$2,$3); $a=~s/./X/g; $c=~s/./X/g; print "$a$b$c"}
elsif (/^([^X]{1,3})(X.*)/) {($a,$b)=($1,$2); $a=~s/./X/g; print "$a$b"}
elsif (/^(.*X)([^X]{1,3})$/) {($a,$b)=($1,$2); $b=~s/./X/g; print "$a$b"}
else {print}
' f25
>Read_1
XXXXXXXXXXABCXXXXXXX
XXXXXXXXXXABCXXXXXXX
XXXXXXXXXXABCXXXXXXX
XXXXXXXXXXABCXXXPQRS
XXXXXXXXXXABCXXXXXXX
XXXXXXXXXXABCXXXXXXX
XXXXXXXXXXABCXXXXXXX
PQRSXXXXXXABCXXXXXXX
XXXXXXXXXXABCXXXXXXX
XXXXXXXXXXABCXXXXXXX
XXXXXXXXXXABCXXXXXXX
XXXXXXXXXXABCXXXPQRS
XXXXXXXXXXABCXXXXXXX
XXXXXXXXXXABCXXXXXXX
XXXXXXXXXXABCXXXXXXX
XXXXXXXXXXABCXXXPQRS
XXXXXXXXXXABCXXXXXXX
XXXXXXXXXXABCXXXXXXX
XXXXXXXXXXABCXXXXXXX
XXXXXXXXXXABCXXXPQRS
PQRSXXXXXXABCXXXXXXX
PQRSXXXXXXABCXXXXXXX
PQRSXXXXXXABCXXXXXXX
PQRSXXXXXXABCXXXPQRS
XXXXXXXXXXABCXXXXXXX
$
$
$
tyler_durden
1 Like
awk '
/^..X/{sub("^..X","XXX")}
/X..$/{sub("X..$","XXX")}
1' file
1 Like
Franklin52's idea is much better.
You do not really have to check if the first 3 or last 3 characters are non-X. The result is code brevity.
$
$ cat f25
>Read_1
XXXXXXXXXXABCXXXXXXS
XXXXXXXXXXABCXXXXXRS
XXXXXXXXXXABCXXXXQRS
XXXXXXXXXXABCXXXPQRS
PXXXXXXXXXABCXXXXXXX
PQXXXXXXXXABCXXXXXXX
PQRXXXXXXXABCXXXXXXX
PQRSXXXXXXABCXXXXXXX
PXXXXXXXXXABCXXXXXXS
PXXXXXXXXXABCXXXXXRS
PXXXXXXXXXABCXXXXQRS
PXXXXXXXXXABCXXXPQRS
PQXXXXXXXXABCXXXXXXS
PQXXXXXXXXABCXXXXXRS
PQXXXXXXXXABCXXXXQRS
PQXXXXXXXXABCXXXPQRS
PQRXXXXXXXABCXXXXXXS
PQRXXXXXXXABCXXXXXRS
PQRXXXXXXXABCXXXXQRS
PQRXXXXXXXABCXXXPQRS
PQRSXXXXXXABCXXXXXXS
PQRSXXXXXXABCXXXXXRS
PQRSXXXXXXABCXXXXQRS
PQRSXXXXXXABCXXXPQRS
XXXXXXXXXXABCXXXXXXX
$
$ # Perl equivalent of Franklin52's script
$ perl -plne 's/^...X/XXXX/; s/X...$/XXXX/' f25
>Read_1
XXXXXXXXXXABCXXXXXXX
XXXXXXXXXXABCXXXXXXX
XXXXXXXXXXABCXXXXXXX
XXXXXXXXXXABCXXXPQRS
XXXXXXXXXXABCXXXXXXX
XXXXXXXXXXABCXXXXXXX
XXXXXXXXXXABCXXXXXXX
PQRSXXXXXXABCXXXXXXX
XXXXXXXXXXABCXXXXXXX
XXXXXXXXXXABCXXXXXXX
XXXXXXXXXXABCXXXXXXX
XXXXXXXXXXABCXXXPQRS
XXXXXXXXXXABCXXXXXXX
XXXXXXXXXXABCXXXXXXX
XXXXXXXXXXABCXXXXXXX
XXXXXXXXXXABCXXXPQRS
XXXXXXXXXXABCXXXXXXX
XXXXXXXXXXABCXXXXXXX
XXXXXXXXXXABCXXXXXXX
XXXXXXXXXXABCXXXPQRS
PQRSXXXXXXABCXXXXXXX
PQRSXXXXXXABCXXXXXXX
PQRSXXXXXXABCXXXXXXX
PQRSXXXXXXABCXXXPQRS
XXXXXXXXXXABCXXXXXXX
$
$
tyler_durden
1 Like
Thanks again, Franklin52
Your awk script is wonderful and worked perfectly in my case
---------- Post updated at 10:35 AM ---------- Previous update was at 10:33 AM ----------
Sorry my mistakes cause your misunderstanding, tyler_durden
You're right.
It should be "achieve" instead of "archive"
Thanks a lot for your latest perl command too.
It worked perfect and easy for me to edit the perl command according different situation too.
Thanks again, tyler_durden.