Is it possible to remove redundant names in the 4th column?
input
cqWE 100 200 singapore;singapore
AZO 300 400 brazil;america;germany;ireland;germany
....
....
output
cqWE 100 200 singapore
AZO 300 400 brazil;america;germany;ireland
With Perl you could write something like this:
perl -lpe'
%_ = ();
s/[^\s]+$/join ";", grep !$_{$_}++, split ";", $&/e
' infile
With awk the code will be more noisy.
a 'noisy' awk: awk -f quincy.awk myFile
quincy.awk:
{
split("",t)
n=split($4, a,";")
$4=""
for(i=1;i<=n;i++)
if( !(a in t)) {
$4=(i==1)?a:$4 ";" a
t[a]
}
print
}
1 Like
If you want/prefer to use awk and you have a recent GNU awk implementation,
you could reconstruct the records after the change exactly (including variable FS')
and preserve the original formatting:
awk '{
split($0, t, FS, s)
for (i = 0; ++i < NF;)
printf "%s", $i s
n = split(t, tt, fs)
delete _; lf = x
for (i = 0; ++i <= n;)
lf = lf (_[tt]++ ? x : tt fs)
print substr(lf, 1, length(lf) - 1)
}' fs=\; infile