AWK - Extracting matched line

not4google · November 1, 2006, 12:01pm

Hi all,

I have one more query related to AWK. I have the following csv data:

,qwertyA, field1, field2, field3, field4, field5, field6
,,,,,,,,,,,,,,,,,,,100,200
,,,,,,,,,,,,,,,,,,,300,400
,qwertyB, field1, field2, field3, field4, field5, field6
,,,,,,,,,,,,,,,,,,,100,200
,,,,,,,,,,,,,,,,,,,300,400

nawk -F"," ' /qwertyA/ 
    {print $0}
'

I need to match the qwertyA row which its doing and then be able to carry on to print the rest of its values which are the lines with 100, 200 and 300, 400. Im confused as the /qwertyA/ will return the line matched but how do you go abouts prints its records stopping at the next line which is qwertyB?

Any suggestions on doing this please?

tmarikle · November 1, 2006, 12:07pm

You can force it with the getline function if the number of lines following qwertyA is constant.

nawk -F"," ' /qwertyA/ 
    {print $0 ; getline ; print $0; getline ; print $0}
'

vgersh99 · November 1, 2006, 12:13pm

given your sample data, what is the desired output?

not4google · November 1, 2006, 12:20pm

This is the data to work on:

,qwertyA, field1, field2, field3, field4, field5, field6
,,,,,,,,,,,,,,,,,,,100,200
,,,,,,,,,,,,,,,,,,,300,400
,qwertyB, field1, field2, field3, field4, field5, field6
,,,,,,,,,,,,,,,,,,,100,200
,,,,,,,,,,,,,,,,,,,300,400

I want too extract:

,qwertyA, field1, field2, field3, field4, field5, field6
,,,,,,,,,,,,,,,,,,,100,200
,,,,,,,,,,,,,,,,,,,300,400

The length of the values following qwertyA and B is not fixed so the getline would not solve this issue,

Thanks again,

vgersh99 · November 1, 2006, 1:36pm

assuming:

you're searching for a string that appears in the SECOND field
any subsequent records to be printed have NO value in the second field

nawk -v pat='qwertyA' -f goog.awk myFile.txt

goog.awk:

BEGIN {
   FS=OFS=","
}
$2 == pat {found=1; print;next}
found && $2 == "" {print;next}
found {found=0}

If the above assumptions are not valid, please elaborate what the 'real life' patterns you have and how would YOU do it manually.

not4google · November 2, 2006, 5:01am

Hi thanks again for the suggestions,

The data to be working on is:

,qwertyA, field1, field2, field3, field4, field5, field6
,,,,,,,,,,,,,,,,,,,100,200
,,,,,,,,,,,,,,,,,,,300,400
,qwertyB, field1, field2, field3, field4, field5, field6
,,,,,,,,,,,,,,,,,,,100,200
,,,,,,,,,,,,,,,,,,,300,400
,qwertyC, field1, field2, field3, field4, field5, field6
,,,,,,,,,,,,,,,,,,,1200,2300
,,,,,,,,,,,,,,,,,,,3200,4400

I want to extract the line with qwertyA and the subsequent 2 lines beneath it (wont always be 2 lines of data, these are the lines with the ",,,,,,,,").

Doing it manually I would look for the first instance of the data with qwertyA and then get all the lines beneath that untill I come across another line which doesnt contain the list of "," (the lines that are italicised)...

Hope thats a bit clearer,

vgersh99 · November 2, 2006, 7:00am

does the posted work for you with the mentioned assumptions?

not4google · November 2, 2006, 7:21am

No I tried it and now Im not getting anything returned, where previous the issue was that all the lines were returned,

vgersh99 · November 2, 2006, 7:32am

strange - it works just fine with your sample data given the noted assumptions.

not4google · November 2, 2006, 10:02am

Sorry my mistake, thanks for youre help it does work - I was using a slightly different source file (due to company sensitivity reasons)

THANKS AGAIN,