copying columns with headers' specific pattern

jacks · June 1, 2010, 6:25am

Hi friends,
I have data in tab separated file with headers like this :

*sml1 *sml3 *smln7 smfk9 smllf56...

Which shell command I should use if i want to extract entire columns that have header names beginning with "*" ? i want to copy these columns into another file.

Thanks,

Franklin52 · June 1, 2010, 6:33am

Try:

grep '^\*' file > newfile

jacks · June 1, 2010, 6:37am

Thanks. It did not work.

Scrutinizer · June 1, 2010, 6:54am

How about:

awk '{for (i=1;i<=NF;i++) {if (NR==1 && $i~/^*/) A; if (i in A) printf $i"\t"} print ""}' infile

jacks · June 1, 2010, 7:03am

it keeps selecting all columns.

Franklin52 · June 1, 2010, 7:04am

Sorry, I misread the question, try:

tr ' ' '\n' < file | grep '^\*' > newfile

Scrutinizer · June 1, 2010, 7:15am

Hi, could you indicate where your input file differs from my test sample?

$ cat infile
*sml1   *sml3   *smln7  sm*fk9  smllf56 *tr     yty
1       2       3       4       5       6       7
a       b       c       d       e       f       g

$ awk '{for (i=1;i<=NF;i++) {if (NR==1 && $i~/^*/) A; if (i in A) printf $i"\t"} print ""}' infile
*sml1   *sml3   *smln7  *tr
1       2       3       6
a       b       c       f

---------- Post updated at 13:15 ---------- Previous update was at 13:13 ----------

I tried it with your newly supplied input and I get this:

$ awk '{for (i=1;i<=NF;i++) {if (NR==1 && $i~/^*/) A; if (i in A) printf $i"\t"} print ""}' infile2
*smt_0018       *smlf_0042      *sugk_0053
11.2902 10.7395 7.9672
8.98292 9.69812 7.7105
7.71334 8.05211 7.57295
9.07925 8.96038 8.1141
10.1289 9.17119 7.52616
11.5821 8.76748 8.19344
9.18122 10.1511 13.4099

Franklin52 · June 1, 2010, 7:15am

This is my output with your file:

$ cat file
*smt_0018 smf_0031 *smlf_0042 *sugk_0053
11.2902 12.4807 10.7395 7.9672
8.98292 9.03125 9.69812 7.7105
7.71334 8.28308 8.05211 7.57295
9.07925 9.17628 8.96038 8.1141
10.1289 8.9778 9.17119 7.52616
11.5821 9.54838 8.76748 8.19344
9.18122 9.71467 10.1511 13.4099
$
$ tr ' ' '\n' < file | grep '^\*'                             
*smt_0018
*smlf_0042
*sugk_0053

Scrutinizer · June 1, 2010, 7:29am

Using your latest sample I get:

$ awk '{for (i=1;i<=NF;i++) {if (NR==1 && $i~/^*/) A; if (i in A) printf $i"\t"} print ""}' sample
*sml_0018       *smmt_0031      *srmt_0053
1       1       1
11.2902 12.4807 7.9672
8.98292 9.03125 7.7105
7.71334 8.28308 7.57295
9.07925 9.17628 8.1141
10.1289 8.9778  7.52616
11.5821 9.54838 8.19344
9.18122 9.71467 13.4099

What do you get if you escape the *:

awk '{for (i=1;i<=NF;i++) {if (NR==1 && $i~/^\*/) A; if (i in A) printf $i"\t"} print ""}' infile

Are you on Solaris? Then you need to use nawk or /usr/xpg4/bin/awk

Franklin52 · June 1, 2010, 7:33am

Oh my, didn't realize it's a tab delimited file, try:

tr '\t' '\n' < file | grep '^\*' > newfile

Please use code tags.

jacks · June 1, 2010, 7:36am

Hi Scrutinizer,

This works nice :

awk '{for (i=1;i<=NF;i++) {if (NR==1 && $i~/^\*/) A[i]; if (i in A) printf $i"\t"} print ""}' infile

Can you please tell me why it worked this time ?

Thanks to you and Franklin.

Scrutinizer · June 1, 2010, 7:58am

Hi Jacks,

Glad we found the culprit. * is a wildcard character (it repeats the preceding character). If one needs to use it literally (to match the character "*" it would typically need to be escaped by a backslash character. However since in this particular the * is first character the pattern it could not possibly be a wildcard and hence it should not need an escape. But not every implementation of awk sees it that way apparantly.

bankimmehta · June 1, 2010, 7:59am

hi jacks,
I tried the code given by Franklin52, and it works fine. Do you need the data as weel or jus the column names..??