Grep Data Base on Header

HI Guys,

File A.txt

UID,HD1,HD2,HD3,HD4
1,2,33,44,55
2,10,14,15,16

File B.txt

UID
HD1
HD4

A.txt B.txt >>>Output.txt

UID,HD1,HD4
1,2,55
2,10,16

What specifically are you looking to do and what have you tried to solve this...

Just Copy Data From One file to another base on Header Name.

And what have you tried to solve it on your own...

I have tried Below

$ cols=($(sed '1!d;s/, /\n/g' $A | grep -nf $B | sed 's/:.*$//'))

$ cut -d ',' -f 1$(printf ",%s" "${cols[@]}") $A 

You weren't far off. Your field separator is a comma, not a comma followed by a space. So, if you change your first sed from:

sed '1!d;s/, /\n/g' $A

to:

sed '1!d;s/,/\n/g' $A

as in:

#!/bin/ksh
A=A.txt
B=B.txt
cols=($(sed '1!d;s/,/\n/g' "$A" | grep -nf "$B" | sed 's/:.*$//'))
cut -d ',' -f 1$(printf ",%s" "${cols[@]}") "$A"

it seems to do what you want. You might also want to consider:

#!/bin/ksh
A='A.txt'
B='B.txt'
awk '
BEGIN {	FS = OFS = ","
}
FNR == NR {
	h[++hc] = $0
	next
}
FNR == 1 {
	for(i = 1; i <= hc; i++) {
		for(j = 1; j <= NF; j++) {
			if($j == h) {
				o = j
				break
			}
		}
		if(j > NF) {
			printf("Header \"%s\" not found in file \"%s\".\n",
			    h, FILENAME)
			exit 1
		}
	}
}
{	for(i = 1; i <= hc; i++)
		printf("%s%s", $o, (i == hc) ? ORS : OFS)
}' "$B" "$A"

which invokes awk once instead of invoking sed twice, grep once, and cut once; so it should run a bit faster.

As always, if you want to try this on a Solaris/SunOS system, change awk to /usr/xpg4/bin/awk or nawk .

:wink:

# cat A.txt
UID,HD1,HD2,HD3,HD4
1,2,33,44,55
2,10,14,15,16
# cat B.txt
UID
HD1
HD4

Solution

# awk 'FNR==NR{h[$0]=$0;next}FNR==1{for(;i++<NF;){if($i==h[$i]){o[l++]=i}}}{for(_ in o)printf "%s%s",$o[_],(_==(l-1))?RS:FS}' FS=\, B.txt A.txt
UID,HD1,HD4
1,2,55
2,10,16

That doesn't preserve the desired sequence of columns for longer lists. man awk :

Small adaption:

awk '
BEGIN           {OFS = FS = ","}
FNR == NR       {h[$0] = NR
                 MX = NR
                 next
                }
FNR == 1        {for (i=1; i<=NF; i++) if ($i in h) a[h[$i]] = i
                }

                {for (i=1; i<=MX; i++) printf "%s%s", $a, (i==MX)?RS:OFS
                }
' file2 file1
1 Like

https://www.gnu.org/software/gawk/manual/html_node/Controlling-Array-Traversal.html

Base on my experience by default awk will use incremental traversal , maybe I'm wrong.

You're wrong. RudiC is correct. As an example, on OS X version 10.11.3 the command:

printf '%s\n' 1 2 3|awk '{a[$1]}END{for(i in a)print i}'

produces the output:

2
3
1

Using a different version of awk should produce the same lines of output, but the order in which they are printed is specifically not specified by the standards.

1 Like

Thanks Guys.

Perfect ....

But its Stuck when header not found from file2.txt...

Does the awk script I suggested in post #5 in this thread get "Stuck" in that case; or does it print a diagnostic message and exit?