Extract text between two character positions

Twinklefingers · January 31, 2012, 10:31am

Greetings.

I need to extract text between two character positions, e.g: all text between character 4921 and 6534.

The text blocks are FASTA-format sequence of whole chromosomes, so basically a million A, T, G, C, combinations. E.g:

>Chr_1
ACCTGTTCAACTCTCAGGACTCTCAGGTCAACTCTCAG
CAACTCTCAGGAACTCTCAGGTCAACTCTCACTCTCAG
GTCAACTCTCCAGGAACTCTCCACTCTCAGAGGTCAAC
.......

I need to extract a region of genes, I know the character positions that are the boundaries.

I need the equivalent of what this does for lines:

sed -n 'line1,line2p" > new_file.txt

But for character positions.

Thanks!

Scrutinizer · January 31, 2012, 10:56am

See if this works:

awk 'NR>1{p=$0;sub($1 ORS,x,p);sub(ORS,x,p); print RS $1 ORS substr(p, 4921,6534-4921+1)}' RS=\> OFS= infile

Twinklefingers · January 31, 2012, 11:07am

Perfect! Het werkt! Bedankt!

Scrutinizer · January 31, 2012, 11:24am

Nothing to thank