Join files, omit duplicated records from one file

Hello

I have 2 files, eg

more file1 file2
::::::::::::::
file1
::::::::::::::
1   fromfile1
2   fromfile1
3   fromfile1
4   fromfile1
5   fromfile1
6   fromfile1
7   fromfile1
::::::::::::::
file2
::::::::::::::
3   fromfile2
5   fromfile2

I want to merge these but only include duplicated fields from the second file. So the result is

1   fromfile1
2   fromfile1
3   fromfile2
4   fromfile1
5   fromfile2
6   fromfile1
7   fromfile1

Basically merging 2 files but omitting any records in file 1 which appear in file2 based on the key field.

I've started to cobble a script together which

  • makes a list of key fields from file2
  • loops round reading that file and uses grep -v to remove records with that key from file1
  • then use uniq -d to only keep records which were duplicated (so I now have copy of file1 but with noly records 1,2,4,6,7)
  • then concatenate this file and file2

This only if file2 has exactly 2 records.

This feels like something which should be simple but I can't figure it out. I suspect I should be able to use join or maybe awk to achieve what i want but I can't get there & can't find anything through Google.

Can anyone suggest a more elegant solution to my approach? (& frankly one which works because mine doesn't)

Many thanks, Chris

Hello CHoggarth,

Could you please try following and let me know if this helps you.

awk 'FNR==NR{a[$1]=$0;next} ($1 in a){print a[$1];next} 1'  Input_file2  Input_file1

Output will be as follows.

1   fromfile1
2   fromfile1
3   fromfile2
4   fromfile1
5   fromfile2
6   fromfile1
7   fromfile1
 

Thanks,
R. Singh

1 Like

Try also

sort file[12] | uniq -uw1 | sort - file2
1   fromfile1
2   fromfile1
3   fromfile2
4   fromfile1
5   fromfile2
6   fromfile1
7   fromfile1
1 Like

Ravinder - thanks. This looks great when I change awk to nawk - at least as far as my spec went. I now realise I need to go back to the business & find out whether file2 might include records with keys which are not in file1 at all. If I try that with your solution those records are not included. Is there an easy amendment to your solution? eg file 2 also includes a record:

8     fromfile2

RudiC - thanks for the reply. I should have said I'm working on Solaris. Unfortunately uniq doesn't have a -w switch on my machine.

Chris

Hi.

Some versions of Solaris could have GNU uniq et al installed:

OS, ker|rel, machine: SunOS, 5.11, i86pc
Distribution        : Solaris 11.3 X86
guniq uniq (GNU coreutils) 8.16
gawk GNU Awk 3.1.8

Bezs wishes ... cheers, drl