Awk: check element in array and it's value

Hello,
I want to see if element exists in array, if so then, check it's corresponding value.

Column 4 is position and column 1 is the chromosome for it. There are duplicates for one position on one chromosome.

I want to check if same position exists on different chromosome:

Data format:

2       rs267607987     0       47702411        0       T
2       seq-rs587779123 0       47702411        0       I
2       seq-rs587779124 0       47702411        0       D
11      seq-rs730880711 0       47364479        0       I
11      seq-rs863225110 0       47364479        0       I
11      seq-rs863225271 0       47364479        0       I
11      seq-rs397515973 0       47359006        0       D
11      seq-rs727503187 0       47359006        0       D
11      seq-rs730880654 0       47359006        0       D
17      DUP-rs80358150  0       41209068        0       C
17      rs273901754     0       41209068        0       D
17      rs80358150      0       41209068        0       C
17       seq-rs5827779124 0       47702411        0       D

I want to check if position column 4 has duplicates on different values for column 1.
in this case:

2       seq-rs587779124 0       47702411        0       D
17       seq-rs5827779124 0       47702411        0       D

Following code fails:

awk ' {if ($4 in arr) && if (arr[$4]==$1){ print arr[$4],$4} else {arr[$4]=$1} }' testcol.txt

Error:

awk:  {if ($4 in arr) && if (arr[$4]==$1){ print arr[$4],$4} else {arr[$4]=$1} }
awk:                  ^ syntax error

Would really appreciate any guidance here.

Which of the three lines for chromosome 2 should be selected?

It's not important which of the three lines are to be selected from chromosome 2. I'd like to see if same position is present across different chrosomes.

OK.

For your code, put the && within the single if 's parentheses:

awk ' {if (($4 in arr) && (arr[$4]==$1)) { print arr[$4],$4} else {arr[$4]=$1} }' file

But this won't print your desired result. Try

awk '($4 in LN) && ($1 != CHR[$4]) {print LN[$4]; print} {LN[$4] = $0; CHR[$4] = $1}' file
2       seq-rs587779124 0       47702411        0       D
17       seq-rs5827779124 0       47702411        0       D
 awk ' {if (! $4 in arr){arr[$4]=$1} if ( ($4 in arr) && (arr[$4]!=$1 ) ){print "Isseu "} }' testcol.txt

I got the && working but the condition fails. Can't figure out why. I can copy paste your code, but would like to know bug in mine.

Use parentheses around the first $4 in arr Right now, you check (!$4) in arr .

I can't get your code working if I change the order.

 awk ' {if (! $4 in CHR){ CHR[$4]=$1; LN[$4]=$0 } { if ( ($4 in LN) && (CHR[$4] != $1 ) ){print } } }' testcol.txt

I'm totally lost what's wrong :frowning:

Hi, try:

(!($4 in CHR))

rather than

(! $4 in CHR)

Got it. Thank you. :slight_smile: