For work i am trying to generate a combined csv file excisting out of 2 other csv files.
The problem i am facing is that the first field on both files have multiple values in there which arent always the same. This first field is also the joining part.
The layout of the files is as follows:
File1: EAN, Articlenr(our own)
File2: EAN, Articlenr(Supplier), SKU, Stock, Price
The first field can contain more then 1 EAN code in the following format:
HUGE ASSUMPTION WARNING
I am assuming that the EAN code is unique across both files for any one product...
ALSO UNTESTED CODE WARNING, but the germ of an idea
perl -ne '
BEGIN{
open (my $our_file,"<","/path/to/file1/with/just/ean/and/sku");
while (<$our_file>){
my@r=split/;/,$_;
for my $ean (split/,/,$r[0]){
$rec1{$ean}=$_;
}
}
}
@r=split/;/,$_;
@ean=split/,/,$r[0];
for $ean (@ean){
if ($rec1{$ean}){
@p=split/;/,$rec1{$ean};
@our_eans=split/,/,$p[0];
my %eans;
for (@ean,@our_ean){
$eans{$_}++;
}
print join(";",join(",",keys %eans),$p[1],@r[1,2,3]),"\n";
}
}' path/to/file2 >merged_file.csv
I am confuse here by seeing your sample output, as follows are the comments on same.
Barcode;Art_ours;Art_Sup;SKU;stock;price;manufacturer
4960999865300;testnderp;8935214;0023030102;555;342.70;REV
### Above line you are comparing the first columns of File1 and File2.
1230000000010, 1240000000010;ND010;12345;99263999;555;33.01;Pac
### Above line you are NOT comparing the first column of File1 and File2?
If you want to compare always the first columns of both the Input_files then following may help you.
awk -F"[,|;]" 'FNR==NR{A[$1]=$0;next} ($1 in A){q=$1;sub($1,X);print A[q] $0}' Input_file1 Input_file2
If you have some other conditions with your query, I request you to please let us know the complete details on same, it will be helpful for us to help you. Hope this helps.
There are 3 lines that contain the same EAN code in both files so those should be able to join together.
---------- Post updated at 11:28 AM ---------- Previous update was at 11:24 AM ----------
Sorry that it looked confusing. I had to manual type them over from the shell. The first column is the one that it should join together as long as 1 of the EAN codes is in both files.
I tested your AWK one-liner and it functions exactly as i needed it. Thank you for that.
I know but i am no perl guru. Sorry
---------- Post updated at 01:07 PM ---------- Previous update was at 11:28 AM ----------
It seems that the awk line still has a error in there
I just tested it with real data of which there where supposed to be 3 results in the merged file.
The strange thing is that 1 of the lines works fine but the others dont work. I am guessing that is because of the multiple EAN codes in file 1. When i put the EAN code in the first position it worked fine but on any other position then the first it does not work.
Thank you for the fast reply. I tested it and it seems to work superb of which you have my thanks. 1 question remains though. Could you perhaps explain the code perhaps so i know how it works?
awk '
BEGIN {FS = OFS = ";" ###### Setting Field seprator and output field seprator as ;
print "Barcode;Art_ours;Art_Sup;SKU;stock;price;manufacturer" ###### print Barcode;Art_ours;Art_Sup;SKU;stock;price;manufacturer in begining of the code execution.
}
NR == FNR {gsub (/ /, "", $1) ###### NR==FNR is a condition which willbe TRUE when first file is being read because FNR is a variable which will have the line number same as NR but FNR's value will be reset each time new file is being read and NR's value will be keep increasing till last file completes.
for (n = split($1, T, ","); n > 0; n--) S[T[n]]=$2 ###### starting a for loop here, in which an array named T created whose delimiter is ,(comma) only for $1 because split works on method of split(line/field, array_name, delimiter). Now we are taking the count of array T's element into variable n. and for loop will run till n's value is greater than 0. Then we have created array named S whose index is value of array T's value whose index is variable n and assigning array S's value to $2 of file1.
next ###### Now don't do any further action by next keyword and start from condition NR==FNR.
}
$1 in S {$2 = S[$1] OFS $2 ###### This will be executed while second file file2 is being read, so $1 in S means, any $1's value which is present in array S, if this is TRUE then perform action where make $2(second field's) value to S[$1] OFS $2's value and assign again it to $2 itself.
print ###### Now print the complete line(with new value of $2, which we had formed in previous step.)
}
' file[12] ###### mentioning Input_files file1 and file2 here.
awk '
BEGIN {FS = OFS = ";" # set in- and output separator
print "Barcode;..." # print header
}
NR == FNR {gsub (/ /, "", $1) # remove spaces from first field
for (n = split($1, T, ","); n > 0; n--) S[T[n]]=$2
# split SKUs into S array indexed by EANs
next # skip processing; goto next line
}
$1 in S {$2 = S[$1] OFS $2 # if EAN found in S arr: insert SKU into field 2
print # print modified line
}
' file[12]
Even though it has been a while before we finally started to use this code below, i am running against a (minor) problem.
#!/bin/sh
awk '
BEGIN {FS = OFS = ";"
print "ean;sku;pps_reference;mpn;stock;price;manufacturer"
}
NR == FNR {gsub (/ /, "", $1)
for (n = split($1, T, ","); n > 0; n--) S[T[n]]=$2
next
}
$1 in S {$2 = S[$1] OFS $2
print
}
' $1 $2 > $3
In file 1 i have the line:
0024066040145,0240660401452,4054318561209,5712505072332;P00000052;Hoya
and in file 2 i have the line:
0024066040145,0240660401452;887024;YDUVP052;50;15.86;Hoya
The strange thing that happens is that it wont join both lines in file 3 even though there are 2 different EAN codes which it can use to join them. Can someone explain how this code needs to be changed so this wont cause an issue?