Column grep

css136 · September 30, 2010, 2:18am

hi everyone,

I am looking for an easy way to extract columns from a text file based on a regular expression, kind of like grep but searching and returning columns instead.

for example, suppose i have the following file, 'file.txt':

A B C D B
1 2 3 4 5
6 7 8 9 0
5 6 7 8 9
2 3 8 9 0

I would like a command, let's call it transpose_grep, that does this:

cat file.txt | transpose_grep 'C'

It would return:

C
3
8
7
8

To give another example:

cat file.txt | transpose_grep 'B'

would return two columns, since 'B' appears twice:

B B
2 5
7 0
6 9
3 0

Is there a unix tool I can use to do this? Thanks for the help!

kurumi · September 30, 2010, 2:57am

$ ruby -ane 'BEGIN{h={}};$F.each_with_index{|x,y|h[y+1]=x} if $.==1; h.select {|k,v| print "#{$F[k-1]} " if v == "B"} and puts if $.>1 ' file
2 5
7 0
6 9
3 0

$ ruby -ane 'BEGIN{h={}};$F.each_with_index{|x,y|h[y+1]=x} if $.==1; h.select {|k,v| print "#{$F[k-1]} " if v == "C"} and puts if $.>1 ' file
3
8
7
8

matrixmadhan · September 30, 2010, 3:15am

I tried it for single occurrence, just extend the same for multiple occurrences.

use strict;
use warnings;

open(F, '<', 'a') or die;

my $data;
$data = <F>;
chomp($data);
my $cnt = 0;
foreach my $d ( split(/  */, $data) ) {
    last if ( $d =~ /$ARGV[0]/ );
    $cnt++;
}

print $ARGV[0], "\n";
while ( $data = <F> ) {
    chomp($data);
    my $val = (split(/  */, $data))[$cnt];
    print $val, "\n";
}

close(F);

radoulov · September 30, 2010, 3:38am

And another one:

perl -lane'BEGIN { $k = shift }
  @cols = grep $F[$_] eq $k , 0..@F 
    if $. == 1; 
  print "@F[@cols]"
  ' <pattern> infile

Given your sample data it outputs the following result:

% perl -lane'BEGIN { $k = shift }
  @cols = grep $F[$_] eq $k , 0..@F
    if $. == 1;
  print "@F[@cols]"
  ' B infile 
B B
2 5
7 0
6 9
3 0
% perl -lane'BEGIN { $k = shift }
  @cols = grep $F[$_] eq $k , 0..@F
    if $. == 1;
  print "@F[@cols]"
  ' C infile 
C
3
8
7
8

frans · September 30, 2010, 5:17am

#!/bin/bash
COLS=$(head -1 file)
for C in $COLS
do
((i++))
[ $C = $1 ] && IDX+="$i,"
done
cut -d' ' -f${IDX%,} file

css136 · September 30, 2010, 3:16pm

hey everyone, this is perfect! Thanks! the bash script is exactly what i was looking for (but perl and ruby are great too)

rdcwayx · September 30, 2010, 11:41pm

var=B
awk -v s=$var '
NR==1 {for (i=1;i<=NF;i++) if ($i==s) a[++j]=i}
{for (i=1;i<=j;i++) printf $a FS;printf RS}
' infile