I have a .csv file. It has 1000 some rows and about 7 columns...
but before I insert this data to a table I have to parse it and clean it ..basing on the value of the first column..which a string of phone number type...
here (111)222-3333 is considered duplicate and 2000 takes precedence over 1000
so I have to remove the row with values (111)222-3333 1000 ...How do I achieve this ??
any help is greatly appreciated.
I cannot manipulate the Excel file.
It comes from a third party and we have to run the batch file to handle the data that they send before inserting into our DB.
---------- Post updated at 09:01 AM ---------- Previous update was at 08:57 AM ----------
tried some thing like this ..
awk '
{s[$1]++}
END {
for(i in s) {
if(s[i]>1) {
print i
}
}
}'
It wouldnt work.. It would consider only (111) as duplicates ..not the whole number..
So I changed to this ..
awk '
{s[($1)$2"-"$3]++}
END {
for(i in s) {
if(s[i]>1) {
print i
}
}
}'
still doesnt help ..its working as if its given
awk '
{s[$0]++}
END {
for(i in s) {
if(s[i]>1) {
print i
}
}
}'