Duplicate value with different index

jiam912 · November 19, 2014, 3:26am

Hello Gents,

Please give a help with this case

Input

The case is to check that all duplicate values have the same index
values are in columns 1-8, index are in column 9-10.
in the first case the first value duplicate (10001010) have index G1, G2 and G3, and also the 3rd value duplicate (10001014) has G1 and G2.

Then I would like to get a output
like this

Value 10001010 has different code G1 G2 G3
Value 10001014 has different code G1 G2

if there is not errors found can i get a msg ..

Not errors found

Thanks for your help

hergp · November 19, 2014, 5:44am

This bash-script should do the trick:

#!/bin/bash

declare -A array

while read line
do
    case "${array[${line:0:8}]}" in
        *${line:8:2}*)
            ;;
        *)
            array[${line:0:8}]="${array[${line:0:8}]} ${line:8:2}";;
    esac
done

for val in "${!array[@]}"
do
    if [[ ${array[$val]} == \ *\ * ]]
    then
        echo Value $val has different code ${array[$val]}
        found=1
    fi
done

[[ -z $found ]] && echo Not errors found

Execute it with

script.sh <inputfile

jiam912 · November 19, 2014, 8:17am

Thanks for the answer. The script works fine
but takes a long time to process.
Is there a fast way to do it with awk?

hergp · November 19, 2014, 9:15am

Try this

{
    val = substr ($1, 1, 8);
    cod = substr ($1, 9, 2);

    if (index (array[val], cod) == 0) array[val] = array[val] " " cod;
}

END {
    for (val in array)
        if (match (array[val], " .* .*"))
        {
            print "Value " val " has different code" array[val];
            found=1
        }

    if (found == 0) print "Not errors found"
}

Execute with

awk -f script.awk <inputfile

jiam912 · November 19, 2014, 9:23am

Dear hergp

Thaks a lot for your help, I have many files to merge before I get the complete file to read whith your scritp

how I can at the file to be read inside of the script

example i will merge many files with extention .xls (example), then

cat *.xls* > database

how to add this in your script and read the file created

So I will tape the name of the script and by default he will know that the name of the file to be read will be

database

regards

hergp · November 19, 2014, 9:27am

awk can read multiple files in one run, just use all filenames or a wildcard expression on the command line

awk -f script.awk *.xls

You do not need cat for this. Or if you need the intermediate file database , then try

cat *.xls | tee database | awk -f script.awk

Hope this helps.