Hello,
For our work we use several scripts to gather/combine data for use in our webshop. Untill now we did not had any problems but since a couple days we noticed some mismatches between imports.
It happened that several barcodes where matched even though it was a complete other product. Of course the scripts arent checking on this yet so we need to upgrade the scripts to check for this and give us a list to check or update the listing.
The supplier sends us a CSV file with data as shown below:
supplier_clean.csv
ean;pps_reference;stock;price;sku;mpn;manufacturer
4260010852693;1043154;84;743.42;P00000172;70100118555;Fujifilm
4960999575285;273189;9400;141.80;P00009067;2768B016;Canon
0013803092899,4960999575292;27433196;44;44.94;P00022338;2768B017;Canon
8715946388540;2944686;1030;47.76;P00000878;C13S042167;Epson
0088698115763,3141725001174;3654125;20;54.80;P00004251;C1825A;Hewlett Packard
This file is being joined to another file with the following code, more on this here:
joining.sh
#!/bin/sh
awk '
BEGIN {FS = OFS = ";"
print "ean;sku;pps_reference;mpn;stock;price;manufacturer;supplier_code"
}
{gsub (/ /, "", $1)
}
NR == FNR {for (n = split($1, T, ","); n > 0; n--) S[T[n]]=$2
next
}
{for (n = split($1, T, ","); n > 0; n--) if (T[n] in S) {$2 = S[T[n]] OFS $2
print
next
}
}
' $1 $2 > $3
This script get called as follows.
join_prijslijst.sh website_clean.csv supplier_clean.csv results.csv
The website_clean has the following data (short example)
website_clean.csv
Barcode;Sku;Manufacturer
0696720480781,4000567150589;P00002801;Braun Photo Technik
4000461043031;P00002800;D�rr
4000461034213,4000461037818;P00002799;D�rr
0891257001526,8912570015266;P00002634;Gary Fong
0891257001106;P00002633;Gary Fong
0887111646026;P00002632;HP
0887111515629;P00002631;HP
The problem is that the checking if the manufacturer has to happen during the joining together and to make matters worse some suppliers have different names for some suppliers (For example HP, Hewlet packard etc etc).
My idea is to have another file where first the website_clean checks for all the possible names of that manufacturer (see below for example) and this then compares against the supplier_clean csv. If the correct name is in there it continues as normal and if not it writes this line to a seperate file which we then can manual check for the names. In this seperate file i need both lines though so we can check which would be the correct name/product.
manufacturer_check.csv
manufacturer
HP,Hewlett Packard, HP INC.
Canon
Fujifilm
Epson
WD,Western Digital
I hope this is clear in what needs to happen to make it work. If not let me know and i will try to explain it better.