Separate character and number variable across columns

Hi all,

I want to separate a character from a number variable. My data looks like this:

chr    pos    A1 A2
chr1   1245  T     A
chr2    6789  G    C

I want it to look like this:

chr    pos    A1   A2  chr_num
chr1  1245    T     A        1
chr2    6789  G    C        2

How can I do this using the command line? Also, anyone know how to do this in R?

I tried this in R, but it didn't exactly work.

chr <- gsub("chr", "", paste(short_sleep_hg38_hg19$chr))

Thank you!

Please start using markdown code tags when posting code and data samples - this can potentially improve traction of your threads.
Having said that... what have you tried so far(othen than R)?

1 Like

With a Posix-compliant awk

awk '{ f1=$1; gsub(/[^[:digit:]]/,"", f1); printf "%s %7s\n", $0, (NR==1 ? "chr_num" : f1) }' file

This removes all non-digits from field #1.
The following variant removes one contiguous non-digit string from field #1:

awk '{ f1=$1; sub(/[^[:digit:]]+/,"", f1); printf "%s %7s\n", $0, (NR==1 ? "chr_num" : f1) }' file

I will absolutely start using the markdown tags. In regard to your question, I had looked up different websites and had trouble understanding how to revise the code to fit my needs. However, I found something that worked:

awk -F"chr" '$1=$1' OFS="\t" sample.txt > sample2.txt
awk -F'chr|[[:blank:]]' '{print $0 "\t" ($2?$2:"chr_num")}'