grep grab 19 letters from now or a full line

Hi,

I have a file like this

>hg19_chr1_123_456_+
asndbansbdahsjdbfsjhfghjdsghjdghjdjhdghjjdkhfsdkjfhdsjkdkjghkjdhgfjkhjfkf
hasjgdhjsgfhjdsgfdsgfjhdgjhdjhdhjdfhjdfjgfdfbdghjbfjksdhfjsfdghjgdhjgfdjhgd
jhgdfj
>hg19_chr1_123_456_-
akjldshfuiewyruiewehbjhvbdcnmbfhdsjfjdbfhdbhjdbghjfdbghjbdfghjdfbjhbkk
jsdhfjdgjfdgjfdgjkfdhjkfhkjfhjkfjkhkjfskjdfhkjhgjkdgkjfdhgjkfhjkfhgkfhkfkjhgf
dshjghjdg
>hg19_chr2_234_456_+
skjfhdsjkfghdjkghdfjkhgjkfdghjkdfuiertytoierytuireyteiruytueriyteruytierutye
sjhdjashdjahjkdasjkhdajkshdkajshdkashdasruweyriweyrueiwryewrewurewuu
jdhfjkdshf

I want to grep the start of line to be '>' and grab the next 19 letters or the whole line

So, my output will be

>hg19_chr1_123_456_+
>hg19_chr1_123_456_-
>hg19_chr2_234_456_+

Like this:

sed -n '/^>/s:\(.\{20\}\).*:\1:p' infile
1 Like
awk '/^>/{print substr($0,1,20)}' input-file
1 Like

@neutron scott. It worked for me. I have more numbers than 3 in the pattern. So, I used 25 instead of 20.

@elixir_sinari - Ur command works too. But, when I increase the 20 to 25, it omits some records. I donno why.

Thanks to both of u.

yah the sed match there says exactly 20, rather than up to 20. I believe that'd need to be \{0,25\}

You said you want the next 19 letters or the whole line.

You have been given two ways to get the '>' and the next 19 letters, but now you seem to want the next 24 characters.

If you want the entirety of lines starting with '>', just use:

grep '^>' file

otherwise, you need to explain how we are supposed to determine when you want 19 characters, when you want 24 characters, and when you want the whole line.

1 Like