Hi Shash,
Your desired output doesn't make any sense to me. When you find lines in both files for a given key, the data from the fields in both files are intermixed. When data is only found in one file, why isn't the output supposed to have the data from each field in that input file in the output columns related to those input fields. In other words, with your sample input data, why isn't the output:
Cu1L1B1L2B2
Cu2L1 L2
Cu3L1B1L2B2
instead of the output you said you want:
Cu1L1B1L2B2
Cu2L1L2
Cu3L1B1L2B3
especially since there is no B3
anywhere in either of your sample input files?
To get the output shown above, you could use something like:
awk '
FNR == NR {
data1[key = substr($0, 1, 3), 1] = substr($0, 8, 2)
data1[key, 2] = substr($0, 10, 2)
keys[key]
next
}
{ data2[key = substr($0, 1, 3), 1] = substr($0, 4, 2)
data2[key, 2] = substr($0, 6, 2)
keys[key]
}
END { for(key in keys)
printf("%s%2.2s%2.2s%2.2s%2.2s\n", key, data1[key, 1],
data2[key, 1], data1[key, 2], data2[key, 2])
}' file1.txt file2.txt > output.txt
but note that the order of the output lines may vary. If the output order matters, you need to clearly state how the output order should be determined when:
- file1.txt contains keys that do not appear in file2.txt (as in your example),
- file2.txt contains keys that do not appear in file1.txt, and
- both files contain keys that do not appear in the other file.
Note that I asked what operating system you're using and you didn't answer.
If you're using a Solaris/SunOS system and want to try the above code, change awk
to /usr/xpg4/bin/awk
or nawk
.
The missing 2nd components of the 2nd data1[] and data2[] assignments have been fixed as noted by RudiC in post #5.