Fetching a line matching a pattern

jayadanabalan · June 30, 2015, 7:52am

Hi Gurus,

I have a file as follows (Sample shown below but the list is very huge)

SCHEDULE WS1#JS1
RUNCYCLE1
:
WS1#JOB1
WS1#JOB2
FOLLOWS JOB1
END

SCHEDULE WS2#JS1
RUNCYCLE2
:
WS1#JOB3
WS1#JOB1
FOLLOWS JOB3
WS2#JOB1
FOLLOWS JOB1
END

Now i have another file as below

WS1#JOB1
WS2#JOB1
WS3#JOB4

Now i need to fetch the schedule names from the first file for the jobs seen in the second file. So now the output would be as below

WS1#JOB1 WS2#JS1
WS1#JOB1 WS2#JS1
WS2#JOB2 WS2#JS1
WS3#JOB4 Not Available in file1

Kindly help me out with this.

RudiC · June 30, 2015, 8:46am

I don't think your desired output is compliant with your input data, and it has duplicate values in it. So it was difficult to implement sth fitting to your specification.
Anyhow, try

awk '
NR==FNR         {T[$1]
                 next
                }
/SCHEDULE/      {SCHEDNAM=$2}

$1 in T         {print $1, SCHEDNAM
                 delete T[$1]
                }
END             {for (t in T) print t, "not in file1"
                }
' file2 file1
WS1#JOB1 WS1#JS1
WS2#JOB1 WS2#JS1
WS3#JOB4 not in file1

jayadanabalan · June 30, 2015, 11:59pm

Hi Rudic,

Thanks for the reply.

The script is working with the example that i had given, but when i try this with a big file, the script doesnt seem to work. I am getting not in file1 for all the data in file2.

RudiC · July 1, 2015, 2:18am

So - the big file doesn't seem to be structurally identical to your samples.

jayadanabalan · July 1, 2015, 2:22am

This is a part of my big file

#Mon-Sun (incl hol)  
 
SCHEDULE A01G3GBOAPP1A#J01AQMSPBATCH01  
ON RUNCYCLE RULE1 "FREQ=WEEKLY;INTERVAL=1;BYDAY=MO,TU,WE,TH,FR,SA,SU" 
UNTIL 1200 +1 DAYS  
PRIORITY 0 
: 
A01G3GBOAPP1A#J01AQMSPSR2PWEB 
 PRIORITY 0 
 
A01G3GBOAPP1A#J01AQMSPSURPWEB 
 PRIORITY 0 
 
A01G3GBOAPP1A#J01AQMSPCUSTSUR 
 PRIORITY 0 
END 
 
#Mon-Sun (incl hol)  
 
SCHEDULE A01G3GBOAPP1A#J01AQMSPBATCH02  
ON RUNCYCLE RULE1 "FREQ=WEEKLY;INTERVAL=1;BYDAY=MO,TU,WE,TH,FR,SA,SU" 
UNTIL 1200 +2 DAYS  
PRIORITY 0 
: 
A01G3GBOAPP1A#J01AQMSPEMAILIN 
 PRIORITY 0 
END

Now if i search for "A01G3GBOAPP1A#J01AQMSPCUSTSUR", i must get output with "A01G3GBOAPP1A#J01AQMSPBATCH01"

A01G3GBOAPP1A#J01AQMSPCUSTSUR A01G3GBOAPP1A#J01AQMSPBATCH01

Likewise "A01G3GBOAPP1A#J01AQMSPCUSTSUR" can be in multiple schedules as well.

RudiC · July 1, 2015, 2:37am

And what's in "another file as below" (cf. post#1)?

jayadanabalan · July 1, 2015, 2:49am

The other file contains data like "A01G3GBOAPP1A#J01AQMSPCUSTSU" and there are more than 23000 records. Example seen below.

A01G3GDB1A#AEACCTBALMSTR
A01G3GDB1A#AEACCTBALMSTR_N
A01G3GDB1A#AEBALANCESHEETFACT
A01G3GDB1A#AEBALANCESHEETFACT_N
A01G3GDB1A#AECO1PREFSHARES

RudiC · July 1, 2015, 2:56am

On first sight, I can't see a match between lines in "other file" and "big file". How do you expect anyone to compose some code when it can't be seriously tested against meaningful samples?

jayadanabalan · July 1, 2015, 3:41am

Sorry, I gave that for sample. The list is actually huge so i just took some samples.

Please use the below jobs for testing the jobs from schedules.

A01G3GBOAPP1A#J01AQMSPSR2PWEB
A01G3GBOAPP1A#J01AQMSPEMAILIN
A01G3GWPAPP1A#J01G3GPND2R007

RudiC · July 1, 2015, 7:49am

This is my IMMEDIATE result running my unmodified proposal against your samples:

A01G3GBOAPP1A#J01AQMSPSR2PWEB A01G3GBOAPP1A#J01AQMSPBATCH01
A01G3GBOAPP1A#J01AQMSPEMAILIN A01G3GBOAPP1A#J01AQMSPBATCH02
A01G3GWPAPP1A#J01G3GPND2R007 not in file1

So - where's the problem?

jayadanabalan · July 6, 2015, 5:09am

That is working fine. the issue was with my file1 which is not in the correct format. When i print $2, it leaves the first 3 characters and fetches the remaining lines.

that was the reason i for not available for all the records in file1. Extracted the file again and got this done.

Thanks much for the help