How to repeat a character in a field if it's a single character?

I have a csv dataset like this :

C,rs18768
G,rs13785
GA,rs1065
G,rs1801279
T,rs9274407
A,rs730012

I'm thinking of use like awk, sed to covert the dataset to this format: (if it's two character, then keep the same)

CC,rs18768
GG,rs13785
GA,rs1065
GG,rs1801279
TT,rs9274407
AA,rs730012

Could anyone give me some clues ?

Hello nengcheng,

It is always recommended to add your efforts in your post which you have out in order to solve your own problem, could you please try following.

awk 'BEGIN{FS=OFS=","} length($1)==1{sub(/.*/,"&&",$1)} 1 '  Input_file

Output will be as follows.

CC,rs18768
GG,rs13785
GA,rs1065
GG,rs1801279
TT,rs9274407
AA,rs730012

2nd solution: Using $1 value itself to make it double.

awk 'BEGIN{FS=OFS=","} length($1)==1{$1=$1$1} 1'   Input_file

Thanks,
R. Singh

1 Like
sed 's/^\(.\),/\1&/'
2 Likes

Cool solution nez, how about a pure BASH one, just for fun :slight_smile:

while IFS=, read field1 field2
do
  if [[ ${#field1} -eq 1 ]]
  then
      field1=${field1}${field1}
  fi
  echo "$field1,$field2"
done < "Input_file"

Thanks,
R. Singh

1 Like
#!/bin/bash
while read -n2 a; do
        read b
        echo ${a//,/$a}$b
done < file

--- Post updated at 10:37 ---

#!/bin/bash
while read a; do
        echo ${a/#?,/${a%,*}${a%,*},}
done < file
1 Like

Thank you Singh, I will try it next time.

--- Post updated at 03:55 AM ---

Thank you, nezabudka it also worked for me.

Hi Ravinder...

Just by removing one set of [] makes your version fully POSIX compliant:

#!/usr/local/bin/dash

echo 'C,rs18768
G,rs13785
GA,rs1065
G,rs1801279
T,rs9274407
A,rs730012' > /tmp/text

while IFS=, read field1 field2
do
    if [ ${#field1} -eq 1 ]
     then
        field1=${field1}${field1}
    fi
    echo "$field1,$field2"
 done < /tmp/text

Results, OSX 10.14.3, default bash terminal, calling dash:

Last login: Sun Apr 28 11:31:12 on ttys000
AMIGA:amiga~> cd desktop/Code/Shell
AMIGA:amiga~/desktop/Code/Shell> ./add_single_char.sh
CC,rs18768
GG,rs13785
GA,rs1065
GG,rs1801279
TT,rs9274407
AA,rs730012
AMIGA:amiga~/desktop/Code/Shell> _
1 Like
$ awk -F, -v OFS=, ' { sub("^.$","&&",$1) } 1 ' file
CC,rs18768
GG,rs13785
GA,rs1065
GG,rs1801279
TT,rs9274407
AA,rs730012
1 Like