awk to match keyword and return matches and unique fields

Trying to use awk to find a keyword and return the matches in the row, but also $1 and $2 , which are the unique id's, but they only appear once. Thank you :).

file

name	31	Index	Chromosomal Position	Gene	Inheritance
		122	2106725	TSC2	AD
		124	2115481	TSC2	AD
		121	2105400	TSC2	AD
		82	135782221	TSC1	AD
		81	135782026	TSC1	AD
		126	2138218	TSC2	AD
		123	2113107	TSC2	AD
		125	2126142	TSC2	AD
name2	12	Index	Chromosomal Position	Gene	Inheritance
		1	43396568	SLC2A1	AD, AR
name3	20	Index	Chromosomal Position	Gene	Inheritance
		188	2135240	TSC1	AD
		179	2103379	TSC1 AD
		191	2137899	TSC2	AD
		181	2110617	TSC2	AD
		190	2137857	TSC2	AD
		189	2137806	TSC2	AD
		186	2133798	TSC2	AD
		187	2135074	TSC2	AD
		180	2105400	TSC2	AD
		183	2122822	TSC2	AD
		192	2138218	TSC2	AD
		185	2125937	TSC2	AD
		184	2125788	TSC2	AD
		193	2138269	TSC2	AD
		182	2112981	TSC2	AD

Desired output

name	  31	Index	Chromosomal Position	Gene	Inheritance
                  82	135782221	TSC1	AD
                  81	135782026	TSC1	AD
name3  20	Index	Chromosomal Position	Gene	Inheritance
                  188	2135240	TSC1	AD
                  179	2103379	TSC1	AD
                  191	2137899	TSC1	AD           

awk

awk '/TSC1/{ print $1,$2,$0 }' file.txt > output.txt 

This seems to come close to what you said you wanted to do:

awk '
/^[[:alnum:]]/ {
	h = $0
	np = 0
}
$3 == "TSC1" {
	if(np++ == 0)
		print h
	print
}' file

but with the sample input you provided, it only prints:

name      31    Index   Chromosomal Position    Gene    Inheritance
                  82    135782221       TSC1    AD
                  81    135782026       TSC1    AD

Since TSC1 does not appear anywhere in your input file after names3 , I have no idea how you got the rest of the output you said you desired.

As always, if anyone wants to try this on a Solaris/SunOS system, change awk to /usr/xpg4/bin/awk (not nawk for this script).

1 Like

I corrected the typo in the input and apologize. Thank you :).

Your updated sample input now has two lines containing TSC1 after name3 , your desired output still has three???

Did my suggestion do what you want done?

1 Like

I'm not in the office now and will post back tomorrow. I'm sure that will work. Thank you :).

awk '
NF == 7         {HD = $0 RS}
$3 == "TSC1"    {printf "%s%s\n", HD, $0
                 HD = ""
                }
' file

Still only two output lines for name3 , not three...

1 Like

Thank you both, works great.... thank you :).