I have a txt file like below:
Accession Orderable Unique ID Subject ID Timepoint Colletion Date/Time
12-170-0185 Glucose 1756487 1 D-07_H00.03 18-JUN-2012 06:03:00
12-170-0185 Insulin 1756487 1 D-07_H00.03 18-JUN-2012 06:03:00
12-170-0200 Glucose 1756487 1 D-07_H00.05 18-JUN-2012 06:05:00
12-170-0200 Insulin 1756487 1 D-07_H00.05 18-JUN-2012 06:05:00
12-170-0215 Glucose 1756487 1 D-07_H00.15 18-JUN-2012 06:15:00
......
Accession Orderable Unique ID Subject ID Timepoint Colletion Date/Time
12-170-0220 Glucose 1759452 3 D-07_H00.15 18-JUN-2012 06:15:00
12-170-0220 Insulin 1759452 3 D-07_H00.15 18-JUN-2012 06:15:00
12-170-0181 Insulin 1751183 4 D-07_H00.03 18-JUN-2012 06:03:00
......
The field I need to extract is the first one called "Accession".
I need to convert the Accession Number in the following format with length 18.
i.e.
12-198-0212 => "000002012198000212",
12-198-0214 => "000002012198000214",
12-198-0216 => "000002012198000216",
12-198-0218 => "000002012198000218",
12-198-0220 => "000002012198000220",
i.e.
0000020
is added in front of
12-198
with
- (dash)
taken out, and
00
is added in front of
-0212
with
- (dash)
taken out as well.
"000002012198000212"
has
" " ,
, and the double-quotes and comma are added for text delimiter file.
Other than "Accession" field, I don't need the rest of fields on the extracted output.
Please advise.
msabhi
August 24, 2012, 3:00pm
2
hope this helps
perl -alne 'if(/^\d+-\d+-\d+/) {@acc=split(/-/,$F[0]);printf("12-198-%s => \"00020%s19800%s\",\n",$acc[2],$acc[0],$acc[2]);}' input_file
Sorry, if I am not clear on i.e.
12-198
It is not static. It can be
12-170
or else.
i.e.
12-170-0185 Glucose 1756487 1 D-07_H00.03 18-JUN-2012 06:03:00
12-170-0185 Insulin 1756487 1 D-07_H00.03 18-JUN-2012 06:03:00
Please advise.
msabhi
August 24, 2012, 3:18pm
4
I thought it wouldn't be constant and had given the code for the same at first..later after going through your req again, i re-edited it since it seemed to be constant:) ..cool..here it goes
perl -alne 'if(/^\d+-\d+-\d+/) {@acc=split(/-/,$F[0]);printf("%s => \"00020%s%s00%s\",\n",$F[0],@acc);}' input_file
This is one way to do it:
$ cat t
Accession Orderable Unique ID Subject ID Timepoint Colletion Date/Time
12-170-0185 Glucose 1756487 1 D-07_H00.03 18-JUN-2012 06:03:00
12-170-0185 Insulin 1756487 1 D-07_H00.03 18-JUN-2012 06:03:00
12-170-0200 Glucose 1756487 1 D-07_H00.05 18-JUN-2012 06:05:00
12-170-0200 Insulin 1756487 1 D-07_H00.05 18-JUN-2012 06:05:00
12-170-0215 Glucose 1756487 1 D-07_H00.15 18-JUN-2012 06:15:00
12-170-0220 Glucose 1759452 3 D-07_H00.15 18-JUN-2012 06:15:00
12-170-0220 Insulin 1759452 3 D-07_H00.15 18-JUN-2012 06:15:00
12-170-0181 Insulin 1751183 4 D-07_H00.03 18-JUN-2012 06:03:00
$ sed -e '1d' -e 's/^\([[:digit:]]\{2\}\)-\([[:digit:]]\{3\}\)-\([[:digit:]]\{4\}\).*/"0000020\1\200\3"/g' t
"000002012170000185"
"000002012170000185"
"000002012170000200"
"000002012170000200"
"000002012170000215"
"000002012170000220"
"000002012170000220"
"000002012170000181"
It works great!
Two things I like to ask:
1) The output shows as below:
12-170-0185 => "0002012170000185",
12-170-0185 => "0002012170000185",
12-170-0200 => "0002012170000200",
12-170-0200 => "0002012170000200",
12-170-0215 => "0002012170000215",
i.e.
12-170-0185 =>
is not needed, just
"0002012170000185", "0002012170000185", etc
. So, I just need the converted Accession number.
2) If I can have another script showing the converted Accession Number along with the other fields, that will be great!
So, the output will look like;
"0002012170000185", Glucose 1756487 1 D-07_H00.03 18-JUN-2012 06:03:00
"0002012170000185", Insulin 1756487 1 D-07_H00.03 18-JUN-2012 06:03:00
"0002012170000200", Glucose 1756487 1 D-07_H00.05 18-JUN-2012 06:05:00
The reason I need to have two scripts (one with only Accession Number; the other with all fileds beginning with Accession Number) is for verifications as it will be very helpful to go through the rows.
Please advise.
msabhi
August 24, 2012, 3:36pm
7
Req 1>
perl -alne 'if(/^\d+-\d+-\d+/) {@acc=split(/-/,$F[0]);printf("\"00020%s%s00%s\",\n",@acc);}' input_file
Req 2>
perl -alne 'if(/^\d+-\d+-\d+/) {@acc=split(/-/,$F[0]);$F[0]=sprintf("\"00020%s%s00%s\",",@acc); print "@F"}'
input_file
1 Like
spacebar:
This is one way to do it:
$ cat t
Accession Orderable Unique ID Subject ID Timepoint Colletion Date/Time
12-170-0185 Glucose 1756487 1 D-07_H00.03 18-JUN-2012 06:03:00
12-170-0185 Insulin 1756487 1 D-07_H00.03 18-JUN-2012 06:03:00
12-170-0200 Glucose 1756487 1 D-07_H00.05 18-JUN-2012 06:05:00
12-170-0200 Insulin 1756487 1 D-07_H00.05 18-JUN-2012 06:05:00
12-170-0215 Glucose 1756487 1 D-07_H00.15 18-JUN-2012 06:15:00
12-170-0220 Glucose 1759452 3 D-07_H00.15 18-JUN-2012 06:15:00
12-170-0220 Insulin 1759452 3 D-07_H00.15 18-JUN-2012 06:15:00
12-170-0181 Insulin 1751183 4 D-07_H00.03 18-JUN-2012 06:03:00
$ sed -e '1d' -e 's/^\([[:digit:]]\{2\}\)-\([[:digit:]]\{3\}\)-\([[:digit:]]\{4\}\).*/"0000020\1\200\3"/g' t
"000002012170000185"
"000002012170000185"
"000002012170000200"
"000002012170000200"
"000002012170000215"
"000002012170000220"
"000002012170000220"
"000002012170000181"
This works well.
Is there a way to remove the header
Accession Orderable Unique ID Subject ID Timepoint Colletion Date/Time
and blank rows?
Please advise.
msabhi
August 24, 2012, 6:24pm
9
@Daniel : Just in case if you missed the answer, please refer to my last post...
Yes, that works fantastic! Really appreciate it!