I have a file that stores data in pairs of lines, following this format:
line 1: header (preceded by ">")
line 2: sequence
Example.txt:
>seq1 name
GATTGATGTTTGAGTTTTGGTTTTT
>seq2 name
TTTTCTTC
I want to filter out the sequences and corresponding headers for all sequences that are less than 11 characters. Desired output:
>seq2 name
TTTTCTTC
I can search each line for lines less than 11 characters, and print that line along with the header. The problem I'm having is ignoring the headers (i.e. lines beginning with ">") when I do the length search.
For example
awk '{lines[NR] = $0} length($0) < 11 {print lines [NR-1]; print lines [NR]} ' example.txt
Gives me
>seq1 name
GATTGATGTTTGAGTTTTGGTTTTT
>seq2 name
>seq2 name
TTTTCTTC
How do I tell awk not to ignore lines beginning with ">"?