Duplicate value with different index

Hello Gents,

Please give a help with this case

Input

10001010G1
10001010G1
10001010G1
10001010G2
10001010G3
10001012G1
10001012G1
10001012G1
10001012G1
10001014G1
10001014G1
10001014G2

The case is to check that all duplicate values have the same index
values are in columns 1-8, index are in column 9-10.
in the first case the first value duplicate (10001010) have index G1, G2 and G3, and also the 3rd value duplicate (10001014) has G1 and G2.

Then I would like to get a output
like this

Value 10001010 has different code G1 G2 G3
Value 10001014 has different code G1 G2

if there is not errors found can i get a msg ..

Not errors found 

Thanks for your help

This bash-script should do the trick:

#!/bin/bash

declare -A array

while read line
do
    case "${array[${line:0:8}]}" in
        *${line:8:2}*)
            ;;
        *)
            array[${line:0:8}]="${array[${line:0:8}]} ${line:8:2}";;
    esac
done

for val in "${!array[@]}"
do
    if [[ ${array[$val]} == \ *\ * ]]
    then
        echo Value $val has different code ${array[$val]}
        found=1
    fi
done

[[ -z $found ]] && echo Not errors found

Execute it with

script.sh <inputfile
1 Like

Thanks for the answer. The script works fine :slight_smile:
but takes a long time to process.
Is there a fast way to do it with awk?

Try this

{
    val = substr ($1, 1, 8);
    cod = substr ($1, 9, 2);

    if (index (array[val], cod) == 0) array[val] = array[val] " " cod;
}

END {
    for (val in array)
        if (match (array[val], " .* .*"))
        {
            print "Value " val " has different code" array[val];
            found=1
        }

    if (found == 0) print "Not errors found"
}

Execute with

awk -f script.awk <inputfile
1 Like

Dear hergp

Thaks a lot for your help, I have many files to merge before I get the complete file to read whith your scritp

how I can at the file to be read inside of the script

example i will merge many files with extention .xls (example), then

cat *.xls* > database

how to add this in your script and read the file created

So I will tape the name of the script and by default he will know that the name of the file to be read will be

database

regards

awk can read multiple files in one run, just use all filenames or a wildcard expression on the command line

awk -f script.awk *.xls

You do not need cat for this. Or if you need the intermediate file database , then try

cat *.xls | tee database | awk -f script.awk

Hope this helps.

1 Like