Dear all
I have a dataset (in text format,delimited by tab) which have 100 variables (say, var0-var99) and more than 100,000 observations. I want to do the following:
for variable var0-var49, I want to add "00" in front of each data (for example, "1" would become "001")
for variable var50-var99, I want to add an underscore _ in front of each data (for example, "1" would become "_1")
How should I write the script?
Please give us a concrete example of your input file format. (Use CODE tags.)
Thanks.
The raw data is like:
Var0 Var1 Var2 ... Var50 Var51 ... Var99
1 22 53 ... 3 76 ... 82
.
.
.
.
22 78 65 ... 89 7 ... 12
and I hope, after running code, the data will look like:
Var0 Var1 Var2 ... Var50 Var51 ... Var99
001 0022 0053 ... _3 _76 ... _82
.
.
.
.
0022 0078 0065 ... _89 _7 ... _12
Assuming that your actual data file has no headers:
awk -F'\t' '{for(i=1;i<=50 && i<=NF;i++) $i="00"$i;for(;i<=NF;i++) $i="_"$i}1' OFS='\t' file
Yoda
January 13, 2013, 4:32pm
5
awk -F'\t' '{ for(i=1;i<=NF;i++) (i<=50)?$i="00"$i:$i="_"$i; }1' OFS='\t' file
RudiC
January 13, 2013, 6:09pm
6
This is certainly not as elegant as I wanted it to be and as above proposals:
$ sed 's/\t\|^/&_/g; s/_/X/51; h; s/X.*$//; s/_/00/g; G; s/\n.*X/_/' file
I was erroneously thinking the s///NUMBER
flag would allow for ranges like 1-50, but it doesn't, does it? So the entire thing ended up clumsy...
Sorry for the late reply and thank you all for great help.
@RudiC , it looks like you are using GNU sed, since since \|
and \t
are extensions. GNU sed would allow something like this:
sed 's/\b[0-9]/_&/51g; s//00&/g' file
1 Like
RudiC
January 15, 2013, 8:13am
9
That's what I had in mind - didn't know you can combine flags to the s
command. It's certainly an advantage if you can read: info sed
: