jacks
June 1, 2010, 6:25am
1
Hi friends,
I have data in tab separated file with headers like this :
*sml1 *sml3 *smln7 smfk9 smllf56...
Which shell command I should use if i want to extract entire columns that have header names beginning with "*" ? i want to copy these columns into another file.
Thanks,
Try:
grep '^\*' file > newfile
How about:
awk '{for (i=1;i<=NF;i++) {if (NR==1 && $i~/^*/) A; if (i in A) printf $i"\t"} print ""}' infile
jacks
June 1, 2010, 7:03am
5
it keeps selecting all columns.
jacks:
Thanks. It did not work.
Sorry, I misread the question, try:
tr ' ' '\n' < file | grep '^\*' > newfile
Hi, could you indicate where your input file differs from my test sample?
$ cat infile
*sml1 *sml3 *smln7 sm*fk9 smllf56 *tr yty
1 2 3 4 5 6 7
a b c d e f g
$ awk '{for (i=1;i<=NF;i++) {if (NR==1 && $i~/^*/) A; if (i in A) printf $i"\t"} print ""}' infile
*sml1 *sml3 *smln7 *tr
1 2 3 6
a b c f
---------- Post updated at 13:15 ---------- Previous update was at 13:13 ----------
I tried it with your newly supplied input and I get this:
$ awk '{for (i=1;i<=NF;i++) {if (NR==1 && $i~/^*/) A; if (i in A) printf $i"\t"} print ""}' infile2
*smt_0018 *smlf_0042 *sugk_0053
11.2902 10.7395 7.9672
8.98292 9.69812 7.7105
7.71334 8.05211 7.57295
9.07925 8.96038 8.1141
10.1289 9.17119 7.52616
11.5821 8.76748 8.19344
9.18122 10.1511 13.4099
This is my output with your file:
$ cat file
*smt_0018 smf_0031 *smlf_0042 *sugk_0053
11.2902 12.4807 10.7395 7.9672
8.98292 9.03125 9.69812 7.7105
7.71334 8.28308 8.05211 7.57295
9.07925 9.17628 8.96038 8.1141
10.1289 8.9778 9.17119 7.52616
11.5821 9.54838 8.76748 8.19344
9.18122 9.71467 10.1511 13.4099
$
$ tr ' ' '\n' < file | grep '^\*'
*smt_0018
*smlf_0042
*sugk_0053
Using your latest sample I get:
$ awk '{for (i=1;i<=NF;i++) {if (NR==1 && $i~/^*/) A; if (i in A) printf $i"\t"} print ""}' sample
*sml_0018 *smmt_0031 *srmt_0053
1 1 1
11.2902 12.4807 7.9672
8.98292 9.03125 7.7105
7.71334 8.28308 7.57295
9.07925 9.17628 8.1141
10.1289 8.9778 7.52616
11.5821 9.54838 8.19344
9.18122 9.71467 13.4099
What do you get if you escape the *:
awk '{for (i=1;i<=NF;i++) {if (NR==1 && $i~/^\*/) A; if (i in A) printf $i"\t"} print ""}' infile
Are you on Solaris? Then you need to use nawk or /usr/xpg4/bin/awk
1 Like
Oh my, didn't realize it's a tab delimited file, try:
tr '\t' '\n' < file | grep '^\*' > newfile
Please use code tags.
2 Likes
jacks
June 1, 2010, 7:36am
11
Hi Scrutinizer,
This works nice :
awk '{for (i=1;i<=NF;i++) {if (NR==1 && $i~/^\*/) A[i]; if (i in A) printf $i"\t"} print ""}' infile
Can you please tell me why it worked this time ?
Thanks to you and Franklin.
Hi Jacks,
Glad we found the culprit. * is a wildcard character (it repeats the preceding character). If one needs to use it literally (to match the character "*" it would typically need to be escaped by a backslash character. However since in this particular the * is first character the pattern it could not possibly be a wildcard and hence it should not need an escape. But not every implementation of awk sees it that way apparantly.
hi jacks,
I tried the code given by Franklin52, and it works fine. Do you need the data as weel or jus the column names..??