Hello,
I have a large CSV file that contains values all on the same column, and in one very long row (e.g. no line breaks till end, with all data values separated by a comma).
The file has two types of data for the values. One begins with the letters rs and some numbers. The other begins with the letter i and some numbers. An example is below (id's are genome identifiers).
rs28931576,rs11542040,rs28931577,rs429358,i6007484,i6007510,rs28931578,i6007500,i6007489,i5000217,i6007504,i6007493,rs769455,i6007507,i6007497,i6007512,i6007495,i6007485,i6007492,i5000216,i5000205,rs7412
My Unix command line knowledge was enough to use the cat and cut commands to get the above data to this point.
I can't seem to figure out how to remove all of the values that begin with the letter i. I've tried some awk and egrep commands, but don't have the mastery yet to get this figured out.
I also need a way to get rid of duplicate commas after the i values are removed.
Right now, I'm using Find-Replace with TextEdit on mac to do these steps, however I'd love to be able to script this.
Any help is much appreciated!