Hello everyone,
I have some data files, with mixed header formats. the sample for the same is:
>ABCD76567.x1
AGTCGATCGTAGTCGTAGCTGT
>ABCD76567.y1
AGTCGATCGTAGTCGTAGCTGT
>ABCD76568.x1 pair_info:898989
AGTCGATCGTAGTCGTAGCTGT
>ABCD76568.y1 pair_info:893489
AGTCGATCGTAGTCGTAGCTGT
>ABCD76569.x1 pair_info:892189
AGTCGATCGTAGTCGTAGCTGT
>ABCD76569.y1 pair_info:2098308
AGTCGATCGTAGTCGTAGCTGT
>ABCD76570.x01 pair_info:8787321
AGTCGATCGTAGTCGTAGCTGT
>ABCD76570.x1 pair_info:898989
AGTCGATCGTAGTCGTAGCTGT
>ABCD76570.y1 pairs_info:898989,87574
AGTCGATCGTAGTCGTAGCTGT
>ABCD76571.x1 pair_info:1626762
AGTCGATCGTAGTCGTAGCTGT
>ABCD76572.x1 pairs_info:898989,34374
AGTCGATCGTAGTCGTAGCTGT
>ABCD76572.y01 pair_info:898989
AGTCGATCGTAGTCGTAGCTGT
>ABCD76572.y1 pair_info:898989
AGTCGATCGTAGTCGTAGCTGT
>ABCD76573.y1 pair_info:113242
AGTCGATCGTAGTCGTAGCTGT
...
....
..
..
I just need to focus on the the first field in the header line and there are 3 things I need to achieve:
- the headers which do not have "pair_info" field are to be put in one file, such that :
>ABCD76567.x1
AGTCGATCGTAGTCGTAGCTGT
>ABCD76567.y1
AGTCGATCGTAGTCGTAGCTGT
...
....
...
- The headers with "pair_info" and "pairs_info" are to be put in one file so that it satisfies the following:
>ABCD76568.x1 pair_info:898989
AGTCGATCGTAGTCGTAGCTGT
>ABCD76568.y1 pair_info:893489
AGTCGATCGTAGTCGTAGCTGT
>ABCD76569.x1 pair_info:892189
AGTCGATCGTAGTCGTAGCTGT
>ABCD76569.y1 pair_info:2098308
AGTCGATCGTAGTCGTAGCTGT
>ABCD76570.x1 pair_info:898989
AGTCGATCGTAGTCGTAGCTGT
>ABCD76570.y1 pairs_info:898989,87574
AGTCGATCGTAGTCGTAGCTGT
>ABCD76572.x1 pairs_info:898989,34374
AGTCGATCGTAGTCGTAGCTGT
>ABCD76572.y1 pair_info:898989
AGTCGATCGTAGTCGTAGCTGT
From the above, I do not need header information with no pairs, such as in case of
>ABCD76573.y1 (no corresponding *.x1 pair) and >ABCD76571.x1 (no corresponding *.y1 pair)
Thanks!