How to copy a column of multiple files and paste into new excel file (next to column)?

Try the -r option to sort . And, e.g. man sort sometimes helps.

There are several easy options here...

Change the order of the operands given to paste :

 > RES; for FN in $(ls *.xls | sort -t. -k1.4n); do cut -f1,5 $FN | paste RES - > TMP; mv TMP RES; done

Reverse the sort order as RudiC suggested:

 > RES; for FN in $(ls *.xls | sort -t. -k1.4nr); do cut -f1,5 $FN | paste - RES > TMP; mv TMP RES; done

In both of the above you can replace ls *.xls with printf '%s\n' *.xls to run a little bit faster (using a shell built-in instead of invoking ls ).

If you know that there will always be at least 10 files to be processed and that there will never be more than 99 files to process, get rid of the ls and the sort :

 > RES; for FN in org?.xls org??.xls; do cut -f1,5 $FN | paste RES - > TMP; mv TMP RES; done

Fix the minor logic problem in what Ravinder suggested and use one invocation of awk instead of thirty invocations of paste :

awk -v OFS='\t' '
{	line[FNR] = ((NR == FNR) ? "" : line[FNR] OFS) $1 OFS $5 }
END {	for(i = 1; i <= FNR; i++) print line }
' $(printf '%s\n' org*.xls | sort -t. -k1.4n) > concats.xls

or:

awk -v OFS='\t' '
{	line[FNR] = ((NR == FNR) ? "" : line[FNR] OFS) $1 OFS $5 }
END {	for(i = 1; i <= FNR; i++) print line }
' org?.xls org??.xls > concats.xls
3 Likes

Dear Don,
There is a minor issue with all those commands mentioned above. As I told, I have 30 excel files. among them few excel files have lesser rows than others. For example excel file 6 to 10 has only 2 rows, whereas other files has 3 rows as shown below.

org1_1	1	1	2.5	100
org1_2	1	2	5.5	98	
org1_3	1	3	7.2	88
.
.
.
org10_1	1	1	2.5	100
org10_2	2	5.5	98	98
.
.
.
org11_1	1	1	3.5	100
org11_2	1	2	8.5	77	
org11_3	1	3	7.2	88

When I execute above commands, it pasting 11th file first row along with 10th file 3rd row as given below,

org1_1	100 . . . . . org10_1	100		org11_2	77
org1_2	98  . . . . . org10_2	98		org11_3	88
org1_3	88  . . . . . 		org11_1	100

Sorry for the delayed response.

Hi dineshkumarsrk,
There is no reason for you to apologize for a delayed response. We are trying to help you solve your problem. If you aren't in a hurry to get something that works, it doesn't make any difference to us.

We might note, however, that all of the suggestions I made work perfectly with every example that was provided in the first twenty-one posts in this thread. In those posts you had shown us the contents of three files and the names of twenty-seven other files. Never was anything said about the number of lines in some of the files being different than the number of lines in other files and, as you have seen, that affects the output that the script you showed us in post #1 would work and it affects the output that every single script suggested in this thread works (since we all followed your example to try to get the output you wanted).

Note that we are here to help you learn how to write code to satisfy your requirements; we are not here to act as your unpaid programming staff with continually changing requirements. Given this new information that files 6 to 10 (is it always 6 to 10, or s 6 to 10 just as an example; is it org6.xls through org10.xls inclusive or does org10.xls go back to three lines) may have different numbers of lines than other files with no prior knowledge about which files are "short" or "long", how would you propose modifying any of the scripts suggested in post #22 to get the output you want based on this new information?

1 Like

I have been using your script without any issue, until I had a data set with different row numbers (I mentioned above). So far, I had processed more than 5 groups of data sets (in each group I had 30 excel files), where in I did not face this kind of problem. Moreover, even I do not expect that, I would face this kind of issue. I thought, my data files have the same number of rows and columns. But, while processing multiple data files, I came to know that it differs. This is why I could not mention it earlier.

 Given this new information that files 6 to 10 (is it always 6 to 10, or s 6 to 10 just as an example; 

It is an example only. My data set would not be same as I mentioned above. In some cases, each file differ in terms of row number (number of rows are not same for all the files). Whereas, the number of columns are same for all the files.

is it org6.xls through org10.xls inclusive or does org10.xls go back to three lines) may have different numbers of lines than other files with no prior knowledge about which files are "short" or "long", how would you propose modifying any of the scripts suggested in post #22 to get the output you want based on this new information?

I need to copy only the field/column number 1 and 5 regardless of the number of rows each file have. It would be tedious to know which file is short and which file is long when I have 50 or 60 excel files.
If it a prerequisite to know, which file is short and which file is long to write a script for the same, I would not have asked for help.

You never said that any of the code that I had previously suggested worked for any of your data sets. You only said that there was a problem because the number of lines in files 6 through 10 only had two lines instead of three.

One possible solution would be to add empty rows to all of the files that have fewer rows than the longest file. If you did that as a first step, all of the existing scripts might do what you wanted. And, if you knew that files 6 - 10 would always contain one line less than the other files, it would be very easy to add an empty line to those four or five files without making the other processing any more complex.

Assuming that you will always have at least one file that is named orgdigits.xls where digits is a single digit and at least one file where digits is two digits and that you will never have any files where digits is more than two digits, one could try:

awk -v OFS='\t' '
FNR == 1 {
	file_cnt++
}
{	data[file_cnt, 1, FNR] = $1
	data[file_cnt, 2, FNR] = $5
	if(FNR > line_cnt)
		line_cnt = FNR
}
END {	for(i = 1; i <= line_cnt; i++)
		for(j = 1; j <= file_cnt; j++)
			printf("%s%s%s%s",  data[j, 1, i], OFS, data[j, 2, i],
			    (j == file_cnt) ? ORS : OFS)
}
' org?.xls org??.xls > concats.xls

but I am disappointed that you were unwilling to make any attempt to solve this new wrinkle on your own.

With the other suggestions in post #22, I assume that you would be able to modify the last line of this code to handle any number of input files.

2 Likes

Dear don,
Yes indeed, I could modify the last line of the code as per my need. Your code works perfectly fine. I have realised that, I need to fix the problem by myself. Thank you for your critical comments, suggestions and help.