Can someone please suggest a script to make the following into one single (continuous) line so that a pattern search can be carried out on the resulting single line.
Note : Sample (may be shorter or longer) and will be contained in a text file
Like Optimus_P, I don't understand why the OP is ignoring my solution to his problem. For the record, when the above input data is run against my script, it outputs:
Line: 20 At position 1 2 unmatched characters
Line: 20 At position 3 MATCH: rrrgds
Line: 20 At position 9 43 trailing characters
pvRRRGDSrgsllsprpvsylkgssggpllcpfghavgifraavctrgva
Line: 49 At position 1 39 unmatched characters
Line: 49 At position 40 MATCH: rhrars
Line: 49 At position 46 6 trailing characters
iierlhglsafslhsyspgeinrvasclrklgvpplrvwRHRARSvrarl
I then joined all of the lines together into one superline. And I commented out the 'echo "$image"' in my script so that it won't print out the line with matches upshifted. When the superline is run against my script, it outputs:
Line: 1 At position 1 952 unmatched characters
Line: 1 At position 953 MATCH: rrrgds
Line: 1 At position 959 289 unmatched characters
Line: 1 At position 1248 MATCH: rgrfvt
Line: 1 At position 1254 1186 unmatched characters
Line: 1 At position 2440 MATCH: rhrars
Line: 1 At position 2446 65 trailing characters
So I got one more match. This explains the motivation for trying to join the lines. I think a better solution is to modify the scripts to find matches across line boundaries. Eliminating the line boundaries is hard and neither of the solutions posted worked very well.
The data file has 2510 letters. That exceeds the maximum line that vi can handle, at least on HP-UX. So vi didn't work. As for the tr solution, I tried:
tr -s "\n" < file1 > file2
which kinda worked, but it left the file with no newline characters at all. Thus the file had zero lines. I used:
echo >> file2
to correct this problem.
At this point, my script worked and spit out the above results, but at some point, ksh will balk at reading a giant line. That's why switching to an algorithm that can match across line boundaries would be the better approach.
Yes so far all suggestions are correct but let's clarify some points.
Making the mulitple lines into a single line does NOT tamper with the data as in fact it is ONE continuous line and is presented on the Protein description web pages as multiple lines for ease of display.
The problem comes in doing matches across line boundaries if the data isn NOT presented in one single line to the ksh script.
If we have a solution to make the searches possible across line boundaries then we have a winner.
My wife and I are carrying out tests with the ksh script and we came up with protein sequences that have many lines some times as many as 50.
So either we try to solve the ksh script, or PERL or we have to try amongst our C gurus which is why I posted there as well.
Someone appeared to be annoyed about my posting under C.
To clarify we aren't doing homework and this is an essential part of an advanced breast cancer research dissertation.