All,
I guess by this time someone asked this kind of question, but sorry I am unable to find after a deep search.
Here is my request
I have many files out of which 2 sample files provided below.
File-1 (with A,B as column headers)
A,B
1,2
File-2 (with C, D as column headers)
C,D
4,5
I need the below output
A,B,C,D
1,2, ,
, ,4,5
I actually got a perl script to concatenate the files the way I want but it is shuffling the column order randomly.
Why do some of your output fields have <space> as a replacement for data that is not present (row 2, column C; row 3, column A; and row 3, column B) and other output fields have nothing as a replacement for data that is not present (row 2, column D)?
Why isn't the desired output just:
A,B,C,D
1,2,3,4
?
Are the column headers between the two files always distinct? If not, how do common column headers affect the desired output?
Are there always just one header line and one data line in each input file? If not, how is the output supposed to be ordered? All lines from File-1 followed by all lines from File2? Alternating lines from File-1 and File-2? Does something different happen if the number of lines in the two files differs?
I'm guessing that you might have common column headings somewhere that you want to merge in, but in that case, what rule would you want to apply:-
file1 always?
file2 always?
file2 if value is bigger/smaller?
value only if matched?
the sequence for records to be added?
anything else?
Is this something that could be covered with the paste command perhaps? You might need to prefix your file2 data with ,, to make the column counts match, perhaps with sed -i '2,$ s/^/,,/' file2 or some variation?
Does that help? It still doesn't get over the question of matching columns either and might not do what you want at all. It would help if we had more test data and the various options explained.
Hi Don,
Thanks for going through my request. I am very sorry in my late reply.
What I provided earlier is a simple case.
Here is the desired output for my previous query.
A,B,C,D
1,2,,
,,4,5
.
My actual files contain lot of columns (100+) with some of the columns have same names.
Here are modified sample inputs from 2 files.
In actual case I have many files.
---------- Post updated at 11:57 AM ---------- Previous update was at 11:55 AM ----------
Hello rbattle1,
Yes I do have column headers sometimes between different files that I want to row bind.
Please check my updated post on my requirements.
#!/bin/ksh
IAm=${0##*/}
if [ $# -lt 2 ]
then printf 'Usage: %s file1 ... fileN\n' "$IAm" >&2
exit 1
fi
awk '
BEGIN { FS = OFS = ","
# Gather field names from the 1st line of each input file...
for(i = 1; i < ARGC; i++) {
if((getline < ARGV) != 1) {
printf("*** Cannott read header from file \"%s\".\n",
ARGV)
continue;
}
close(ARGV)
for(j = 1; j <= NF; j++)
if($j in H)
continue;
else { H[$j] = ++Hc
OH[Hc] = $j
# Add to output file header.
printf("%s%s", HOFS, $j)
HOFS = OFS
}
}
print ""
}
FNR == 1 {
# Clear output order list and headers from from previous file.
for(i in O)
delete O
for(i in FH)
delete FH
# Determine order in which input fields should be output for this file...
for(i = 1; i <= NF; i++)
FH[$i] = i
for(i = 1; i <= Hc; i++)
O = (OH in FH) ? FH[OH] : NF + 1
next
}
{ # Print the data lines for the current input file.
for(i = 1; i <= Hc; i++)
printf("%s%s", $O, (i == Hc) ? ORS : OFS)
}' "$@"
This was written and tested using a Korn shell, but should work with any shell that process POSIX standard shell variable expansions.
If you want to try this on a Solaris/SunOS system, change awk to /usr/xpg4/bin/awk . (I don't remember whether or not nawk on Solaris systems includes support for the awk ARGV[] array that is required for this script to work correctly.)