I've got two large csv text table files with different number of columns each.
I have to compare them based on first two columns and create resulting file
that would in case of matched first two columns include all values from first one and all values (except first two colums) from second one. I searched forums and as well google - and was unable to find solution for that issue.
Thank you very much both - your code help me to resolve issue that I had. If it is possible - could you please comment your code - to get better understanding how this script works... I am still learning awk.
NR==FNR { # while in the first file,
a[$1 FS $2]=$0 #+load array a indexed by first field, space ( default FS ), second field;
next #+then jump to the next record, without testing the following actions for the current record. }
$1 FS $2 in a { # Here starts the processing of file2. If field 1, space, and field 2 are found as indexes in array a,
printf a[$1FS$2] #+print the whole record ($0) of file 1 saved above in the array a ;
sub($1 FS $2,"") # substitute field 1, field 2 ( including the space in between ) with null string ;
print # print the whole current record ( what's left ) of the second file. }
You can also combine the first 2 fields to treat them as a single entity, use join (a sort is probably necessary), finally divide the first field into 2:
#!/usr/bin/env bash
# @(#) s2 Demonstrate join on 2 adjacent fields.
echo
set +o nounset
LC_ALL=C ; LANG=C ; export LC_ALL LANG
echo "Environment: LC_ALL = $LC_ALL, LANG = $LANG"
echo "(Versions displayed with local utility \"version\")"
version >/dev/null 2>&1 && version "=o" $(_eat $0 $1) sed join
set -o nounset
echo
FILE1=data1
FILE2=data2
echo " Data file $FILE1:"
cat $FILE1
echo
echo " Data file $FILE2:"
cat $FILE2
# Pre-process input / sort, join / post-process output.
echo
echo " Results:"
sed 's/ /_/' $FILE1 | sort -k1,1 >t1
sed 's/ /_/' $FILE2 | sort -k1,1 >t2
join -j 1 t1 t2 | sed 's/_/ /'
echo
echo " Expected output:"
cat expected-output.txt
exit 0