HI Guys,
File A.txt
UID,HD1,HD2,HD3,HD4
1,2,33,44,55
2,10,14,15,16
File B.txt
UID
HD1
HD4
A.txt B.txt >>>Output.txt
UID,HD1,HD4
1,2,55
2,10,16
HI Guys,
File A.txt
UID,HD1,HD2,HD3,HD4
1,2,33,44,55
2,10,14,15,16
File B.txt
UID
HD1
HD4
A.txt B.txt >>>Output.txt
UID,HD1,HD4
1,2,55
2,10,16
What specifically are you looking to do and what have you tried to solve this...
Just Copy Data From One file to another base on Header Name.
And what have you tried to solve it on your own...
I have tried Below
$ cols=($(sed '1!d;s/, /\n/g' $A | grep -nf $B | sed 's/:.*$//'))
$ cut -d ',' -f 1$(printf ",%s" "${cols[@]}") $A
You weren't far off. Your field separator is a comma, not a comma followed by a space. So, if you change your first sed
from:
sed '1!d;s/, /\n/g' $A
to:
sed '1!d;s/,/\n/g' $A
as in:
#!/bin/ksh
A=A.txt
B=B.txt
cols=($(sed '1!d;s/,/\n/g' "$A" | grep -nf "$B" | sed 's/:.*$//'))
cut -d ',' -f 1$(printf ",%s" "${cols[@]}") "$A"
it seems to do what you want. You might also want to consider:
#!/bin/ksh
A='A.txt'
B='B.txt'
awk '
BEGIN { FS = OFS = ","
}
FNR == NR {
h[++hc] = $0
next
}
FNR == 1 {
for(i = 1; i <= hc; i++) {
for(j = 1; j <= NF; j++) {
if($j == h) {
o = j
break
}
}
if(j > NF) {
printf("Header \"%s\" not found in file \"%s\".\n",
h, FILENAME)
exit 1
}
}
}
{ for(i = 1; i <= hc; i++)
printf("%s%s", $o, (i == hc) ? ORS : OFS)
}' "$B" "$A"
which invokes awk
once instead of invoking sed
twice, grep
once, and cut
once; so it should run a bit faster.
As always, if you want to try this on a Solaris/SunOS system, change awk
to /usr/xpg4/bin/awk
or nawk
.
# cat A.txt
UID,HD1,HD2,HD3,HD4
1,2,33,44,55
2,10,14,15,16
# cat B.txt
UID
HD1
HD4
Solution
# awk 'FNR==NR{h[$0]=$0;next}FNR==1{for(;i++<NF;){if($i==h[$i]){o[l++]=i}}}{for(_ in o)printf "%s%s",$o[_],(_==(l-1))?RS:FS}' FS=\, B.txt A.txt
UID,HD1,HD4
1,2,55
2,10,16
That doesn't preserve the desired sequence of columns for longer lists. man awk
:
Small adaption:
awk '
BEGIN {OFS = FS = ","}
FNR == NR {h[$0] = NR
MX = NR
next
}
FNR == 1 {for (i=1; i<=NF; i++) if ($i in h) a[h[$i]] = i
}
{for (i=1; i<=MX; i++) printf "%s%s", $a, (i==MX)?RS:OFS
}
' file2 file1
https://www.gnu.org/software/gawk/manual/html_node/Controlling-Array-Traversal.html
Base on my experience by default awk will use incremental traversal , maybe I'm wrong.
You're wrong. RudiC is correct. As an example, on OS X version 10.11.3 the command:
printf '%s\n' 1 2 3|awk '{a[$1]}END{for(i in a)print i}'
produces the output:
2
3
1
Using a different version of awk
should produce the same lines of output, but the order in which they are printed is specifically not specified by the standards.
Thanks Guys.
Perfect ....
But its Stuck when header not found from file2.txt...
Does the awk
script I suggested in post #5 in this thread get "Stuck" in that case; or does it print a diagnostic message and exit?