Good afternoon team,
I have export1.csv (2m records) and export2.csv (3m records) with the same column structure, the data is disordered, and neither have row data that exactly matches. And, (much to my chagrin) yes, the data comes pre-packaged with commas.
file 1
CN=username3,OU=TestOU,DC=company,DC=com, CN=username3,OU=TestOU,DC=company,DC=com, meta8907435339
CN=username4,OU=TestOU,DC=company,DC=com, CN=username4,OU=TestOU,DC=company,DC=com, meta9084538488
CN=username1,OU=TestOU,DC=company,DC=com, CN=username1,OU=TestOU,DC=company,DC=com, meta0010193834
CN=username2,OU=TestOU,DC=company,DC=com, CN=username2,OU=TestOU,DC=company,DC=com, meta8974583475
file 2
CN=username2,OU=TestOU,DC=company-TEST,DC=com, CN=username2,OU=TestOU,DC=company-TEST,DC=com, meta0934530054
CN=username1,OU=TestOU,DC=company-TEST,DC=com, CN=username1,OU=TestOU,DC=company-TEST,DC=com, meta6546547888
CN=username5,OU=TestOU,DC=company-TEST,DC=com, CN=username5,OU=TestOU,DC=company-TEST,DC=com, meta6542134546
CN=username4,OU=TestOU,DC=company-TEST,DC=com, CN=username4,OU=TestOU,DC=company-TEST,DC=com, meta4654688798
CN=username3,OU=TestOU,DC=company-TEST,DC=com, CN=username3,OU=TestOU,DC=company-TEST,DC=com, meta5454654987
For my purposes, there are four matching rows in the examples above if we exclude the LAST 22 characters of either col1 or col2. I only need the non-matching (i.e., file2,row 3) lines to be output.
I've considered something along these lines:
awk 'NR==FNR { arr[$0]="1";next } arr[$1]!="1" { print $0 }' file2 file1 > file3
But I lack the ability to modify the filter in the precise way that I need. I have no idea where to start when my need is to exclude the last N chars of row 2 and match on the remainder with output directed to a file with disordered data, to boot. I'm stuck.
Any help would be kindly appreciated!