Select columns from a matrix given within a range in BASH

I have a huge matrix file which looks like this (example matrix):

1 2 3 5
4 5 6 7
7 6 8 9
1 2 4 2
7 6 5 1
3 2 1 9

As one can see, this matrix has 4 columns and 6 rows. But my original matrix has some 3 million rows and 6000 columns.

For example, on this matrix I can define my task as "to extract the first 3 columns from this matrix and store in another file".
Hence, my new file will look like this:

1 2 3
4 5 6
7 6 8
1 2 4
7 6 5
3 2 1

So for my huge matrix, I want to extract the first "1-2000 columns" (including columns 1 and 2000) and then from "3000 to 6000 columns" in two separate files. That is give a range and extract the columns within that range.

I have tried this command to extract the first 2000 columns but it does not work. Later using the same command I can extract columns between 3000 to 6000 just by changing the values in the for loop:

awk -F" " '{for(x=1;x<=2000;x++) { printf "%s\n",$x}}' matrix1.mtx

But the above code does not work as expected.

That's what cut is meant for:

cut -d ' ' -f 1-2000 "$file" > "$newfile"
1 Like

Your code also works, but its not formatted thats all.

Try this

awk -F" " '{ for(x=1;x<=2000;x++) {i=i" "$x} print i;i="" }' matrix1.mtx

regards,
Ahamed

1 Like