remove spaces and lines that start with --

Is it possible to remove empty lines between >humid-sets (bold) and also humidset that start with -- (for ex: > humid3 | () : | (+) )
Thanx in advance

Note: The humid sets will be in thousands and lines will be more than 100 thousand.

input

> humid1 | () : | (+)
ababshdjbshjbjhsbfsfbksfbs----------------nb sbdnf sdbf ------
>sdfsgffsgs  | () : | ||
ababbabafgsfzuyhjkvsmzbcv hjszfmcd----------fvxcv cx
>hubjbmj  | () : | (+)  | () : | (+)
ajdfgcbshjdcgv rsghrjcfvn rhjsgfcv hjs-------afdcbhjsdbc sjhvc sd

> humid2 | () : | (+)
ababshdjbshjbjhsbfsfbksfbs----------------nb sbdnf sdbf ------
>sdfsgffsgs  | () : | ||
ababbabafgsfzuyhjkvsmzbcv hjszfmcd----------fvxcv cx
>hubjbmj  | () : | (+)  | () : | (+)
ajdfgcbshjdcgv rsghrjcfvn rhjsgfcv hjs-------afdcbhjsdbc sjhvc sd

> humid3 | () : | (+)
---------------ababshdjbshjbjhsbfsfbksfbs----------------nb sbdnf sdbf ------
>sdfsgffsgs  | () : | ||
ababbabafgsfzuyhjkvsmzbcv hjszfmcd----------fvxcv cx
>hubjbmj  | () : | (+)  | () : | (+)
ajdfgcbshjdcgv rsghrjcfvn rhjsgfcv hjs-------afdcbhjsdbc sjhvc sd

output

> humid1 | () : | (+)
ababshdjbshjbjhsbfsfbksfbs----------------nb sbdnf sdbf ------
>sdfsgffsgs  | () : | ||
ababbabafgsfzuyhjkvsmzbcv hjszfmcd----------fvxcv cx
>hubjbmj  | () : | (+)  | () : | (+)
ajdfgcbshjdcgv rsghrjcfvn rhjsgfcv hjs-------afdcbhjsdbc sjhvc sd
> humid2 | () : | (+)
ababshdjbshjbjhsbfsfbksfbs----------------nb sbdnf sdbf ------
>sdfsgffsgs  | () : | ||
ababbabafgsfzuyhjkvsmzbcv hjszfmcd----------fvxcv cx
>hubjbmj  | () : | (+)  | () : | (+)
ajdfgcbshjdcgv rsghrjcfvn rhjsgfcv hjs-------afdcbhjsdbc sjhvc sd

grep -v -E '^$|^--' yourfile.txt > /tmp/output.txt

your output

> humid1 | () : | (+)
ababshdjbshjbjhsbfsfbksfbs----------------nb sbdnf sdbf ------
>sdfsgffsgs  | () : | ||
ababbabafgsfzuyhjkvsmzbcv hjszfmcd----------fvxcv cx
>hubjbmj  | () : | (+)  | () : | (+)
ajdfgcbshjdcgv rsghrjcfvn rhjsgfcv hjs-------afdcbhjsdbc sjhvc sd
> humid2 | () : | (+)
ababshdjbshjbjhsbfsfbksfbs----------------nb sbdnf sdbf ------
>sdfsgffsgs  | () : | ||
ababbabafgsfzuyhjkvsmzbcv hjszfmcd----------fvxcv cx
>hubjbmj  | () : | (+)  | () : | (+)
ajdfgcbshjdcgv rsghrjcfvn rhjsgfcv hjs-------afdcbhjsdbc sjhvc sd
> humid3 | () : | (+)
>sdfsgffsgs  | () : | ||
ababbabafgsfzuyhjkvsmzbcv hjszfmcd----------fvxcv cx
>hubjbmj  | () : | (+)  | () : | (+)
ajdfgcbshjdcgv rsghrjcfvn rhjsgfcv hjs-------afdcbhjsdbc sjhvc sd

Thanx but it should remove the whole humdiset3 until it start with other humidset (it starts with >humid). So out should be like this

> humid1 | () : | (+) 
ababshdjbshjbjhsbfsfbksfbs----------------nb sbdnf sdbf ------ 
>sdfsgffsgs  | () : | || 
ababbabafgsfzuyhjkvsmzbcv hjszfmcd----------fvxcv cx
>hubjbmj  | () : | (+)  | () : | (+) 
ajdfgcbshjdcgv rsghrjcfvn rhjsgfcv hjs-------afdcbhjsdbc sjhvc sd 
> humid2 | () : | (+) 
ababshdjbshjbjhsbfsfbksfbs----------------nb sbdnf sdbf ------ 
>sdfsgffsgs  | () : | || 
ababbabafgsfzuyhjkvsmzbcv hjszfmcd----------fvxcv cx 
>hubjbmj  | () : | (+)  | () : | (+) 
ajdfgcbshjdcgv rsghrjcfvn rhjsgfcv hjs-------afdcbhjsdbc sjhvc sd

Something like this?

awk '!/\n--/' ORS='\n' RS='\n\n' humid.txt
1 Like

This should leave out the entire third record with the sample given as well as remove newlines..

awk '$8!~/^--/' RS= infile

or

awk '!/\n--/' RS= infile

Something like this?

perl -ne '!(/^\s+$/||/^--/)&&print' inputfile

Hi, AFAIK only gawk and mawk allow RS to have more than one character. The POSIX specification states that only the first character of the string value of RS should be used as input record separator, so it will break with the other awks.

1 Like

try :

$ cat yourfiles | tr -s "\n"

hth

alexandre botao (progsmith,polymath,ideator)
"comets never dodge"