Deciphering AWK code

anon23021223 · November 28, 2019, 5:24am

Dear experts,
I am a relative novice in the Unix and came across a very useful code that I regularly use for my research blindly. I am wondering if any of the professional members could kindly briefly explain to me what the code actually does?

Many thanks in advance

The script is

awk '!(a[$1]) {a[$1]=$0; next} a[$1] {w=$1; $1=""; a[w]=a[w] $0} END {for (i in a) print a}' FS="\t" OFS="\t" A.txt

The original title is : Find duplicates in column 1 and merge their lines (awk?)

RudiC · November 28, 2019, 5:45am

Here you are:

awk '
!(a[$1])        {a[$1]=$0                       # if array element indexed by $1 is unset or 0, set it to
                                                # the line (i.e. collect first occurrences of $1)
                 next                           # continue with the next input line
                }
a[$1]           {w    = $1                      # if set, save $1 in temp variable
                 $1   = ""                      # and remove it (but leave FS intact)
                 a[w] = a[w] $0                 # then append line to resp. array element
                }
END             {for (i in a) print a        # print all elements containing collected lines
                                                # be aware that the order of elements is unspecified 
                }
' FS="\t" OFS="\t" file

Please note how consistent structuring (e.g. indentation) of the code helps in reading / understanding / seeing patterns in it.

MadeInGermany · November 28, 2019, 2:38pm

A comment:
the existence test a[$1] can give different results on different awk versions, and also it adds an empty array element if there was none.
Better is the test ($1 in a) .
I think one should recode the whole thing:

awk '{i=$1; $1=""; a=(a $0)} END {for (i in a) print (i a)}' FS="\t" OFS="\t" A.txt

This version stores the $1 (field #1) only as an index, not as a value. Therefore, at the END the index is printed before the value.

Scrutinizer · November 28, 2019, 3:33pm

In for some golf ?

awk '{$1=A; A=$0} END{for(i in A) print i A}' FS="\t" OFS="\t" A.txt

anon23021223 · November 29, 2019, 2:05am

Many thanks, everybody!
Your helps are highly appreciated.
Cheers