Dynamic file generation using shell

rspwilliam · September 18, 2014, 3:33am

I have to generate the file dynamically from the source file based on the below control file.

control_file.txt 
1,3,5,-1,8,-1,4

The control file contain the position of column which i required from the source file, Example 1 column ,3 column ,5 column ,blank column(-1 indicates blank column) and so on..

I have written a shell to read one by one position from control file to generate multiple files and finally used paste command to generate new files incase of -1 values i created touch files. Based on sequence of file order its's pasted by ls-v option.

So my existing shell looks like below,

if [ position != -1 ]
then
cut -d, -f$position > file_$var.csv 
else
touch file_$var.csv
fi
paste -d, $(ls -v file_*.csv) > newe_file.csv

I hope there is a way to minimize the I/O of files, I am looking for something below,

cut -d, -f1,3,5 > file1.csv
touch file2.csv
cut -d, -f8 > file3.csv
touch file4.csv
cut -d, -f4 > file5.csv

or better solution will be great.

The number of column in source files will be in hundred's

Expected results:

input-file is sample.csv

col1,col2,col3,col4,col5,col6,col7,col8
1,2,3,4,5,6,7,8
9,10,11,12,13,14,15,16

output.csv

col1,col3,col5,-1,col8,-1,col4
1,3,5,,8,,4
9,11,13,,16,12

output.csv is based on controlfile.txt

Please suggest me to get some idea , I hope sed can help me to achieve !!

Scrutinizer · September 18, 2014, 4:17am

That will be difficult like that because cut will not keep the order of columns. Also you need to somehow put the the columns in file 2 into shell variables first. That wil become a bit cumbersome with a lot of external programs..

An alternative would be to use awk, that for example:

awk 'NR==FNR{split($0,P); next}{split($0,F); $0=x; for(i in P) $i=F[P]}1' FS=, OFS=, controlfile.txt sample.csv

Don_Cragun · September 18, 2014, 4:46am

Note, however, that if you using a UNIX System (rather than a Linux System), awk probably won't work if any lines in your input file are longer than 2048 bytes (actually whatever number is returned by getcont LINE_MAX ). With hundreds of input fields, this may be a problem depending on what OS you're using.