Hello,
I am a newbie to linux and struggling to find a better way to append a column in a text file.
Here is the file i want to modify: It has 8 columns (and thousands of rows). I want to append the first column by adding "chr" infront of the numbers. Some rows have a string in the first column and I don't want to change them.
1 . miRNA 548816 548893 . + . ACC="MI0002023"; ID="dre-mir-155";
1 . miRNA 1651461 1651541 . + . ACC="MI0002180"; ID="dre-mir-459";
1 . miRNA 23269491 23269603 . - . ACC="MI0004786"; ID="dre-mir-740";
1 . miRNA 27656240 27656327 . + . ACC="MI0002052"; ID="dre-mir-218a-2";
1 . miRNA 34527751 34527843 . + . ACC="MI0004780"; ID="dre-mir-734";
1 . miRNA 40174414 40174523 . + . ACC="MI0010857"; ID="dre-mir-2197";
1 . miRNA 46862496 46862635 . - . ACC="MI0001895"; ID="dre-mir-16b";
1 . miRNA 46862739 46862822 . - . ACC="MI0001891"; ID="dre-mir-15a-1";
1 . miRNA 55355143 55355233 . - . ACC="MI0004765"; ID="dre-mir-722";
2 . miRNA 1085488 1085564 . + . ACC="MI0002181"; ID="dre-mir-460";
2 . miRNA 6031391 6031475 . + . ACC="MI0002000"; ID="dre-mir-137-1";
2 . miRNA 22105590 22105669 . - . ACC="MI0004782"; ID="dre-mir-736";
2 . miRNA 23568780 23568883 . - . ACC="MI0010841"; ID="dre-mir-2190";
2 . miRNA 25338635 25338716 . - . ACC="MI0001966"; ID="dre-mir-124-1";
2 . miRNA 31878456 31878533 . + . ACC="MI0001916"; ID="dre-mir-23a-3";
2 . miRNA 31880346 31880476 . + . ACC="MI0001928"; ID="dre-mir-27a";
2 . miRNA 34798348 34798457 . + . ACC="MI0010847"; ID="dre-mir-2198";
2 . miRNA 44164796 44164904 . - . ACC="MI0001366"; ID="dre-mir-181b-1";
2 . miRNA 57907954 57908073 . - . ACC="MI0001879"; ID="dre-mir-7a-3";
Is there any simple way to change the first column. Any help will be appreciated.
Thanks
system
December 29, 2010, 1:04am
2
Input
$ cat file
1 . miRNA 548816 548893 . + . ACC="MI0002023"; ID="dre-mir-155";
1 . miRNA 1651461 1651541 . + . ACC="MI0002180"; ID="dre-mir-459";
1 . miRNA 23269491 23269603 . - . ACC="MI0004786"; ID="dre-mir-740";
1 . miRNA 27656240 27656327 . + . ACC="MI0002052"; ID="dre-mir-218a-2";
1 . miRNA 34527751 34527843 . + . ACC="MI0004780"; ID="dre-mir-734";
1 . miRNA 40174414 40174523 . + . ACC="MI0010857"; ID="dre-mir-2197";
1 . miRNA 46862496 46862635 . - . ACC="MI0001895"; ID="dre-mir-16b";
str . miRNA 46862739 46862822 . - . ACC="MI0001891"; ID="dre-mir-15a-1";
1 . miRNA 55355143 55355233 . - . ACC="MI0004765"; ID="dre-mir-722";
2 . miRNA 1085488 1085564 . + . ACC="MI0002181"; ID="dre-mir-460";
2 . miRNA 6031391 6031475 . + . ACC="MI0002000"; ID="dre-mir-137-1";
str . miRNA 22105590 22105669 . - . ACC="MI0004782"; ID="dre-mir-736";
2 . miRNA 23568780 23568883 . - . ACC="MI0010841"; ID="dre-mir-2190";
2 . miRNA 25338635 25338716 . - . ACC="MI0001966"; ID="dre-mir-124-1";
2 . miRNA 31878456 31878533 . + . ACC="MI0001916"; ID="dre-mir-23a-3";
2 . miRNA 31880346 31880476 . + . ACC="MI0001928"; ID="dre-mir-27a";
2 . miRNA 34798348 34798457 . + . ACC="MI0010847"; ID="dre-mir-2198";
2 . miRNA 44164796 44164904 . - . ACC="MI0001366"; ID="dre-mir-181b-1";
2 . miRNA 57907954 57908073 . - . ACC="MI0001879"; ID="dre-mir-7a-3";
Command
sed 's/^\([0-9].*\)/char \1/g' file
Output
char 1 . miRNA 548816 548893 . + . ACC="MI0002023"; ID="dre-mir-155";
char 1 . miRNA 1651461 1651541 . + . ACC="MI0002180"; ID="dre-mir-459";
char 1 . miRNA 23269491 23269603 . - . ACC="MI0004786"; ID="dre-mir-740";
char 1 . miRNA 27656240 27656327 . + . ACC="MI0002052"; ID="dre-mir-218a-2";
char 1 . miRNA 34527751 34527843 . + . ACC="MI0004780"; ID="dre-mir-734";
char 1 . miRNA 40174414 40174523 . + . ACC="MI0010857"; ID="dre-mir-2197";
char 1 . miRNA 46862496 46862635 . - . ACC="MI0001895"; ID="dre-mir-16b";
str . miRNA 46862739 46862822 . - . ACC="MI0001891"; ID="dre-mir-15a-1";
char 1 . miRNA 55355143 55355233 . - . ACC="MI0004765"; ID="dre-mir-722";
char 2 . miRNA 1085488 1085564 . + . ACC="MI0002181"; ID="dre-mir-460";
char 2 . miRNA 6031391 6031475 . + . ACC="MI0002000"; ID="dre-mir-137-1";
str . miRNA 22105590 22105669 . - . ACC="MI0004782"; ID="dre-mir-736";
char 2 . miRNA 23568780 23568883 . - . ACC="MI0010841"; ID="dre-mir-2190";
char 2 . miRNA 25338635 25338716 . - . ACC="MI0001966"; ID="dre-mir-124-1";
char 2 . miRNA 31878456 31878533 . + . ACC="MI0001916"; ID="dre-mir-23a-3";
char 2 . miRNA 31880346 31880476 . + . ACC="MI0001928"; ID="dre-mir-27a";
char 2 . miRNA 34798348 34798457 . + . ACC="MI0010847"; ID="dre-mir-2198";
char 2 . miRNA 44164796 44164904 . - . ACC="MI0001366"; ID="dre-mir-181b-1";
char 2 . miRNA 57907954 57908073 . - . ACC="MI0001879"; ID="dre-mir-7a-3";
See in the output, the starting string 'str' doesn't replaced with 'char'
I have a question regarding extracting information from csv file. I have very large file with 7 columns and few thousand rows. I would like to search using one or two of these columns and extract information into a text file.
For example, I want to search for Column "Name" for mir-19b and extract all the columns.
Here is the sample csv file.
Small RNA Expression values Length Count Name Match type Mismatches
TGTGCAAATCCATGCAAAACTGA 43,919 23 43,919 mir-19b Mature 0
CAGTGCAATATTAAAAGGGCAT 42,583 22 42,583 mir-130c-1//mir-130c-2 Mature 0
GTGAAATGTTCAGGACCACTTG 28,357 22 28,357 mir-203b Mature 0
TTCCCTTTGTCATCCTATGCCT 27,297 22 27,297 mir-204-1//mir-204-2 Mature 0
TAAAGTGCTTATAGTGCAGGTAG 25,594 23 25,594 mir-20a Mature 1
CAGTGCAATAATGAAAGGGCAT 23,802 22 23,802 mir-130b Mature 0
TCCTTCATTCCACCGGAGTCTG 17,791 22 17,791 mir-205 Mature 2
TGTGCAAATCTATGCAAAACTGA 17,501 23 17,501 mir-19a Mature 0
TACCCTGTAGATCCGGATTTGT 17,431 22 17,431 mir-10c Mature 0
CAGTGCAATAGTATTGTCATAGCAT 17,203 25 17,203 mir-301c Precursor 0
TGGAATGTAAGGAAGTGTGTGG 16,786 22 16,786 mir-206-1//mir-206-2 Mature 0
GTGAAATGTTTAGGACCACTTG 16,657 22 16,657 mir-203a Mature 0
TGTGCAAATCCATGCAAAACTCG 14,449 23 14,449 mir-19c Mature 0
Any suggestions in using perl or linux commands will be helpful.
joeyg
January 3, 2011, 3:08pm
5
After re-reading your follow-up, this should be its own question. Also, you refer to this as a csv file, but your sample did not seem to be a comma-separated file. It looks like a tab-delimited file.