palex
April 7, 2017, 11:38pm
1
Hello,
I am looking to output all of the lines from file2 whose 11th field is present in the first field in file1. Then the second field from file1 should be appended as such:
file1:
2222 0.35
4444 0.25
5555 0.75
file2:
col1 col2 col3 col4 col5 col6 col7 col8 col9 col10 1111
col1 col2 col3 col4 col5 col6 col7 col8 col9 col10 2222
col1 col2 col3 col4 col5 col6 col7 col8 col9 col10 3333
col1 col2 col3 col4 col5 col6 col7 col8 col9 col10 4444
col1 col2 col3 col4 col5 col6 col7 col8 col9 col10 5555
Desired output:
col1 col2 col3 col4 col5 col6 col7 col8 col9 col10 2222 0.35
col1 col2 col3 col4 col5 col6 col7 col8 col9 col10 4444 0.25
col1 col2 col3 col4 col5 col6 col7 col8 col9 col10 5555 0.75
Thanks so much!
Hello palex,
Could you please try following and let me know if this helps you.
awk 'FNR==NR{A[$1]=$0;next} ($NF in A){print $0,A[$NF]}' Input_file1 Input_file2
Thanks,
R. Singh
1 Like
palex
April 8, 2017, 5:49pm
3
Column 11 of file2 was duplicated in the output, but this works for me. Thank you so much!
RudiC
April 9, 2017, 3:29am
4
Try, then,
awk 'FNR==NR{A[$1]=$2;next} ($NF in A){print $0,A[$NF]}' file1 file2
col1 col2 col3 col4 col5 col6 col7 col8 col9 col10 2222 0.35
col1 col2 col3 col4 col5 col6 col7 col8 col9 col10 4444 0.25
col1 col2 col3 col4 col5 col6 col7 col8 col9 col10 5555 0.75
1 Like
drl
April 9, 2017, 7:54am
5
Hi.
If the input files are sorted on the fields to be matched, then one can use:
join [options] <files>
like this:
#!/usr/bin/env bash
# @(#) s1 Demonstrate blending matched-field files, join.
# Utility functions: print-as-echo, print-line-with-visual-space, debug.
# export PATH="/usr/local/bin:/usr/bin:/bin"
LC_ALL=C ; LANG=C ; export LC_ALL LANG
pe() { for _i;do printf "%s" "$_i";done; printf "\n"; }
pl() { pe;pe "-----" ;pe "$*"; }
em() { pe "$*" >&2 ; }
db() { ( printf " db, ";for _i;do printf "%s" "$_i";done;printf "\n" ) >&2 ; }
db() { : ; }
C=$HOME/bin/context && [ -f $C ] && $C join pass-fail
E=expected-output.txt
# Remove old results file.
rm -f f1
pl " Input data files data1, data2:"
head data[12]
pl " Expected output:"
cat $E
# output all of the lines from file2 whose 11th field is present
# in the first field in file1
pl " Results:"
format="2.1,2.2,2.3,2.4,2.5,2.6,2.7,2.8,2.9,2.10,1.1,1.2"
join -t " " -1 1 -2 11 -o "$format" data1 data2 |
tee f1
pl " Verify results if possible:"
C=$HOME/bin/pass-fail
[ -f $C ] && $C || ( pe; pe " Results cannot be verified." ) >&2
exit 0
producing:
$ ./s1
Environment: LC_ALL = C, LANG = C
(Versions displayed with local utility "version")
OS, ker|rel, machine: Linux, 3.16.0-4-amd64, x86_64
Distribution : Debian 8.7 (jessie)
bash GNU bash 4.3.30
join (GNU coreutils) 8.23
pass-fail (local) 1.9
-----
Input data files data1, data2:
==> data1 <==
2222 0.35
4444 0.25
5555 0.75
==> data2 <==
col1 col2 col3 col4 col5 col6 col7 col8 col9 col10 1111
col1 col2 col3 col4 col5 col6 col7 col8 col9 col10 2222
col1 col2 col3 col4 col5 col6 col7 col8 col9 col10 3333
col1 col2 col3 col4 col5 col6 col7 col8 col9 col10 4444
col1 col2 col3 col4 col5 col6 col7 col8 col9 col10 5555
-----
Expected output:
col1 col2 col3 col4 col5 col6 col7 col8 col9 col10 2222 0.35
col1 col2 col3 col4 col5 col6 col7 col8 col9 col10 4444 0.25
col1 col2 col3 col4 col5 col6 col7 col8 col9 col10 5555 0.75
-----
Results:
col1 col2 col3 col4 col5 col6 col7 col8 col9 col10 2222 0.35
col1 col2 col3 col4 col5 col6 col7 col8 col9 col10 4444 0.25
col1 col2 col3 col4 col5 col6 col7 col8 col9 col10 5555 0.75
-----
Verify results if possible:
-----
Comparison of 3 created lines with 3 lines of desired results:
Succeeded -- files (computed) f1 and (standard) expected-output.txt have same content.
See man join
and experiment.
Best wishes ... cheers, drl
1 Like