Is it possible to remove empty lines between >humid-sets (bold) and also humidset that start with -- (for ex: > humid3 | () : | (+) )
Thanx in advance
Note: The humid sets will be in thousands and lines will be more than 100 thousand.
input
> humid1 | () : | (+)
ababshdjbshjbjhsbfsfbksfbs----------------nb sbdnf sdbf ------
>sdfsgffsgs | () : | ||
ababbabafgsfzuyhjkvsmzbcv hjszfmcd----------fvxcv cx
>hubjbmj | () : | (+) | () : | (+)
ajdfgcbshjdcgv rsghrjcfvn rhjsgfcv hjs-------afdcbhjsdbc sjhvc sd
> humid2 | () : | (+)
ababshdjbshjbjhsbfsfbksfbs----------------nb sbdnf sdbf ------
>sdfsgffsgs | () : | ||
ababbabafgsfzuyhjkvsmzbcv hjszfmcd----------fvxcv cx
>hubjbmj | () : | (+) | () : | (+)
ajdfgcbshjdcgv rsghrjcfvn rhjsgfcv hjs-------afdcbhjsdbc sjhvc sd
> humid3 | () : | (+)
---------------ababshdjbshjbjhsbfsfbksfbs----------------nb sbdnf sdbf ------
>sdfsgffsgs | () : | ||
ababbabafgsfzuyhjkvsmzbcv hjszfmcd----------fvxcv cx
>hubjbmj | () : | (+) | () : | (+)
ajdfgcbshjdcgv rsghrjcfvn rhjsgfcv hjs-------afdcbhjsdbc sjhvc sd
output
> humid1 | () : | (+)
ababshdjbshjbjhsbfsfbksfbs----------------nb sbdnf sdbf ------
>sdfsgffsgs | () : | ||
ababbabafgsfzuyhjkvsmzbcv hjszfmcd----------fvxcv cx
>hubjbmj | () : | (+) | () : | (+)
ajdfgcbshjdcgv rsghrjcfvn rhjsgfcv hjs-------afdcbhjsdbc sjhvc sd
> humid2 | () : | (+)
ababshdjbshjbjhsbfsfbksfbs----------------nb sbdnf sdbf ------
>sdfsgffsgs | () : | ||
ababbabafgsfzuyhjkvsmzbcv hjszfmcd----------fvxcv cx
>hubjbmj | () : | (+) | () : | (+)
ajdfgcbshjdcgv rsghrjcfvn rhjsgfcv hjs-------afdcbhjsdbc sjhvc sd
grep -v -E '^$|^--' yourfile.txt > /tmp/output.txt
your output
> humid1 | () : | (+)
ababshdjbshjbjhsbfsfbksfbs----------------nb sbdnf sdbf ------
>sdfsgffsgs | () : | ||
ababbabafgsfzuyhjkvsmzbcv hjszfmcd----------fvxcv cx
>hubjbmj | () : | (+) | () : | (+)
ajdfgcbshjdcgv rsghrjcfvn rhjsgfcv hjs-------afdcbhjsdbc sjhvc sd
> humid2 | () : | (+)
ababshdjbshjbjhsbfsfbksfbs----------------nb sbdnf sdbf ------
>sdfsgffsgs | () : | ||
ababbabafgsfzuyhjkvsmzbcv hjszfmcd----------fvxcv cx
>hubjbmj | () : | (+) | () : | (+)
ajdfgcbshjdcgv rsghrjcfvn rhjsgfcv hjs-------afdcbhjsdbc sjhvc sd
> humid3 | () : | (+)
>sdfsgffsgs | () : | ||
ababbabafgsfzuyhjkvsmzbcv hjszfmcd----------fvxcv cx
>hubjbmj | () : | (+) | () : | (+)
ajdfgcbshjdcgv rsghrjcfvn rhjsgfcv hjs-------afdcbhjsdbc sjhvc sd
Thanx but it should remove the whole humdiset3 until it start with other humidset (it starts with >humid). So out should be like this
> humid1 | () : | (+)
ababshdjbshjbjhsbfsfbksfbs----------------nb sbdnf sdbf ------
>sdfsgffsgs | () : | ||
ababbabafgsfzuyhjkvsmzbcv hjszfmcd----------fvxcv cx
>hubjbmj | () : | (+) | () : | (+)
ajdfgcbshjdcgv rsghrjcfvn rhjsgfcv hjs-------afdcbhjsdbc sjhvc sd
> humid2 | () : | (+)
ababshdjbshjbjhsbfsfbksfbs----------------nb sbdnf sdbf ------
>sdfsgffsgs | () : | ||
ababbabafgsfzuyhjkvsmzbcv hjszfmcd----------fvxcv cx
>hubjbmj | () : | (+) | () : | (+)
ajdfgcbshjdcgv rsghrjcfvn rhjsgfcv hjs-------afdcbhjsdbc sjhvc sd
mirni
January 15, 2012, 10:16pm
4
Something like this?
awk '!/\n--/' ORS='\n' RS='\n\n' humid.txt
1 Like
This should leave out the entire third record with the sample given as well as remove newlines..
awk '$8!~/^--/' RS= infile
or
awk '!/\n--/' RS= infile
Something like this?
perl -ne '!(/^\s+$/||/^--/)&&print' inputfile
Hi, AFAIK only gawk and mawk allow RS to have more than one character. The POSIX specification states that only the first character of the string value of RS should be used as input record separator, so it will break with the other awks.
1 Like
botao
January 23, 2012, 5:11pm
8
try :
$ cat yourfiles | tr -s "\n"
hth
alexandre botao (progsmith,polymath,ideator)
"comets never dodge"