Inserting a header with column number to a 1.6 GB file with special spacing

sogi · November 25, 2009, 2:08am

Hi;

I've been searching posts to find a solution to what I'm trying to do, but I've have NOT found anything yet.

I have a file (file1) with 300K columns and 1411 rows, the columns don't have a column no. header (No header at all) and I'm trying to fetch the information from specific columns.

The only way for me to fetch the information from the file is to go by column number, I have 412 column numbers (file2).

Here is an example of file 1 (tab separated):

1  23  21  24  12  22 .......etc until column 300K
1  23  21  24  12  22
1  23  21  24  12  22
1  23  21  24  12  22
1  23  21  24  12  22
1  23  21  24  12  22
1  23  21  24  12  22
1  23  21  24  12  22.
.
.
.
.
etc until row 1411

In file 1 the columns need to be assigned like this:

1  2   3   4   5   6
1  23  21  24  12  22
1  23  21  24  12  22
1  23  21  24  12  22
1  23  21  24  12  22
1  23  21  24  12  22
1  23  21  24  12  22
1  23  21  24  12  22

file 2 (has only 1 column and 412 rows, each row is a column # for file 1)

123
955
1045
1184
2323
2328
2333
2756
3364
4377
5259
5351
5778
7632
8603
9399
9561
10469
.
.
.
.
until row 412

I tried transposing the large file first, to give it row numbers, BUT the file is too big and I run out of memory. So, transposing is out of the question at this point.

Thank you in advance for any help!

thegeek · November 25, 2009, 3:33am

translate the new line delimited file to space delimited ( if need be change space to tab. )

tr '\n' ' ' < f2 > f3

print the header, and file.,

sed 'r f1' f3

sogi · November 25, 2009, 9:45pm

thegeek;

It works. Thank you. Now, I'm trying to use grep to fetch only the columns from file 1 (now that has a column no. header) with the column numbers listed in file2:

file="file2"
while read line
do
grep $line file1 >> file4
done < $file

But this code does not work, it gives me an even bigger file than file1, it should be much more smaller.

rdcwayx · November 27, 2009, 7:04am

above script can be replaced by :

grep -f file2 file1 >> file4