The problem statement, all variables and given/known data:
Problem Set:
Before you get started working with these challenges, be aware that the first challenge is reformatting the test data file so that you get rid of the �header' and get all of the columns
delimited for working with in unix. (I'll give you another clue in addition to getting rid of the header, learn �grep', �cat', �cut', �awk', �sed' )
write a script to change the extension of your file : Test_Data.snp to Test_Data.txt
print all lines that have an �A' base call either in the reference (column 2) or query (column 3) strain
print only column titled �LEN R' to a new file called Reference_length.txt
sort the file by column 4 ( titled [P2])
print only the lines that have a basecall in columns 2 and 3 (under [SUB] headings) and sort by [LEN R] , output to new file called snp_report.txt
Relevant commands, code, scripts, algorithms:
I'm not sure what this means?
The attempts at a solution (include all code and scripts):
The only thing I know how to do is actually show the data set in the terminal window
Complete Name of School (University), City (State), Country, Name of Professor, and Course Number (Link to Course):
This is part of a learning scholarship over the summer. I am working with Dr. Mia Champion of TGEN North in Flagstaff. She recommended that I come here for help.
Thanks for any help you can provide. I literally just started learning this a day ago, so please bear with me.
You might want to post a sample of the file layout next time, rather than ask we download your whole file. Otherwise, the following should answer most, if not all in succession, but just so you're aware: there's always more than one way to do it.
It's now up to you to actually deconstruct them per your study guide(s) or texts. HTH.
Thank you. I really appreciate it. I've been having a tough time not only figuring out the problem set, but asking for help on the forums. There's just a lot of jargon that I simply don't know. I appreciate your understanding and help.
---------- Post updated at 07:09 PM ---------- Previous update was at 01:38 AM ----------
I just got all the outputs I wanted except I'm still not sure how to "delimit" and remove the header so I can use the data in UNIX?
I was hoping someone could sort of point me in the right direction on how to solve this problem.
What I have to do is compare two sets of numbers. What I need to find is:
The numbers that are the same between both sets
and the numbers that are unique to EACH set.
The two number sets are two different files also.
I've come a little farther than I used to be, so I'm not totally oblivious to UNIX now, but this seriously still the hardest question I've had so I'm clearly not great.