Help needed with Awk programming

scigeek · August 7, 2012, 3:19pm

Hi Everyone,

I need some help in some data extraction that I need to perform.
I have a file with about 91000 lines. Among those lines, there are lines like the following, which are scattered among many other information:

==================

$V1
$V1$mb
[1] "V862"  "V1052" "V1388" "V1876" "V2803" "V2920" "V3269" "V4770"

$V1$nbr
[1] "V862"  "V1388" "V2803" "V4770"

$V1$parents
[1] "V4770"

$V1$children
[1] "V862"  "V1388" "V2803"

=====================

I need to only output all the lines containing $V1$nbr and the line following this line. That is, as an example, I need to extract

$V1$nbr
[1] "V862"  "V1388" "V2803" "V4770"

from this file

It is actually an output from a 6000 node bayes net. I need to do some additional analysis on it. I need to know each node (represented by V followed by a number) and it's neighbors.

How do I extract only these lines:

$Vnumber$nbr 
"Vnumber1" "Vnumber2"....etc

Can anyone help ?

Thanks a lot.

Corona688 · August 7, 2012, 3:22pm

awk '/[$]V1[$]nbr/ { print ; getline ; print }' datafile

vgersh99 · August 7, 2012, 3:26pm

nawk -v s='$V1$nbr' '$1 == s {f=1;print;next} f {print;f=0}' myFile
or
nawk -v s='$V1$nbr' '$1 == s {f=2} --f>=0' myFile

bobbygsk · August 7, 2012, 4:46pm

Useful query!!
The replies shows to print the searched line and the next line but how to print the searched line and n lines following it by taking example above

vgersh99 · August 7, 2012, 5:01pm

a matched line and 3 lines below - adjust 'l=3' as needed.

nawk -v s='$V1$nbr' -v l=3 '$1==s{f=l+1};f&&f--'

Corona688 · August 7, 2012, 5:42pm

vgersh99's second option ought to do what you want. Just set n to 3 or 4 or whatever you want instead of 2.

bobbygsk · August 8, 2012, 11:03am

I tried but not getting any output. I use linux
I have the following file.

Data.txt
=======
<Credentials type="Oracle">
  <Username>ABC</Username>
  <Password>DEF</Password>
  <Environment>DEV</Environment>
  <System>ABC</System>
  <SID>ABCABC</SID>
</Credentials>
<Credentials type="Informatica">
  <Username>PQR</Username>
  <Password>STU</Password>
  <Environment>DEV</Environment>
  <System>PQR</System>
  <SID>XYZ</SID>
</Credentials>

I tried to get username, password and environment lines by the following code

awk '/ABC/ { print ; getline ; print;getline;print }' Data.txt
and
nawk -v s='ABC' -v l=3 '$1==s{f=l+1};f&&f--' Data.txt

vgersh99 · August 8, 2012, 11:43am

If you looked at the code, you'd notice that the script checks the FIRST field $1==s . In your sample is empty where you check for a string 'ABC'.
Possible change - you might change to be more restrictive based on your objectives:

awk -v s='ABC' -v l=3 '$0 ~ s{f=l+1};f&&f--' myFile