The first 2 sequences are identical (different ID and frequencies though). The same thing for the last 2. What I need is to compare all sequences within the file and if they are identical, they need to be 'compressed' to one entry and the frequency should be recalculated. Thus, I will end up with the following file
The last two sequences were not 'combine' into one.
This is what I get
Note that the highlighted sequences are identical (charcater by charecter, not only length) and still were not compressed and consider as 1 entry with s frequency of 13.
That is weird, cause I just tried that on your test data and it did combine those lines. Keep in mind that this command outputs those records in random order. Also double check if you copied the code properly.
I tried one more time and it did not combine the last 2. The order is random but I still can see those 2 sequences. Instead of ending up with 5 differen sequences my file contains 6. I have modified the test data and definitively is not working. I entered 1 more sequence (freq 10) identical to the first 2 at the very end of the file and it did not combine it with the other 2.
Maybe one of those lines contain space at the end? Or some other nonprintable character? You should probably examine this file with some hex editor (or with vim).