Hi,
I have a requirement.for eg: i have a text file with pipe symbol as delimiter(|) with 4 columns a,b,c,d. Here a and b are primary key columns..
i want to process that file to find the duplicates and null values are in primary key columns(a,b) . I want to write the unique records in which pks are not null into one file.. and the duplicate records,the records havin pk columns as null into another file.
awk -F\| '
NR == 1 {
next
}
{
I = $1 OFS $2
if ( ( I in U ) || !($1 && $2) )
print $0 > "dupl.txt"
}
$1 && $2 {
U = $0
}
END {
for ( k in U )
print U[k] > "uniq.txt"
}
' abc.txt
This program creates output files: dupl.txt and uniq.txt with duplicate and unique records.
what should be the output? Or, more explicitly, does the order matter: are 11|55 and 55|11 duplicates keys? And, does each pair of keys have to be unique, or does each individual key have to be unique: are 55|11 and 11|30 duplicates because 11 is a common key? (If the answer to any of these is yes, Yoda's script won't work for you.)