AWK Programming Problem

I need development help with a AWK Program.

1
2
3 THIS IS THE INPUT FILE � EACH EMPLOYEE MAY HAVE FROM 1 TO 30 RECORDS
4 INPUT
5 1-4 Project
6 5-12 Employee
7 13-14 Sequence (MAYBE FROM 1 TO 30)
8 15-21 Work Code
9 22-22 Pri Code
10
11 9991|80100001|1|0389999|1|
12 9991|80100001|2|7070007|2|
13 9991|80100001|3|4912100|1|
14 9991|80100001|4|5990000|2|
15 9991|80100001|5|2500200|1|
16 9991|80100001|6|41401|2|
17 9991|80100001|7|515|1|
18 9991|80100001|8|4019|2|
19 9991|80100001|9|56210|1|
20
21 THIS IS THE OUTPUT (OPTION ONE)
22 ( NEED ONE CONTIGUOUS RECORD PER EMPLOYEE )
23 OUTPUT = project,employer, workcode,workcode,...,pricode, pricode,...
24 1586|80100001|0389999|7070007|4912100|599000|25002 00|41401|515|4019|56210|1|2|1|2|1|2|1|2|1|
25
26
27 THIS IS THE OUTPUT (OPTION TWO)
28 ( NEED ONE CONTIGUOUS RECORD PER EMPLOYEE WITH OUTPUT CONTAINING ROOM FOR 30 WORK CODES AND 30 PRI CODES )
29
30 1586|80100001|0389999|7070007|4912100|599000|25002 00|41401|515|4019|56210||||||||||||||||||||||1|2|1 |2|1|2|1|2|1||||||||||||||||||||||
31
thank for the help.

***
Please do not embed formatting of font/color/etc... when posting.

Input file:

9991|80100001|1|0389999|1|
9991|80100001|2|7070007|2|
9991|80100001|3|4912100|1|
9991|80100001|4|5990000|2|
9991|80100001|5|2500200|1|
9991|80100001|6|41401|2|
9991|80100001|7|515|1|
9991|80100001|8|4019|2|
9991|80100001|9|56210|1|

Output 1:

1586|80100001|0389999|7070007|4912100|599000|25002 00|41401|515|4019|56210|1|2|1|2|1|2|1|2|1|

Output 2:

1586|80100001|0389999|7070007|4912100|599000|25002 00|41401|515|4019|56210||||||||||||||||||||||1|2|1 |2|1|2|1|2|1||||||||||||||||||||||

Can One employee work at the same time on 2 projects ?

Yes they can.

---------- Post updated at 08:48 AM ---------- Previous update was at 08:33 AM ----------

They must be in order by their sequence number.

Which fields is the primary key made of ?

Primary key is made up of the project and employee fields.

Try this:

awk -F\| 'p!=$1 OFS $2{if(s)print s t; t=x; s=p=$1 OFS $2}{s=s OFS $4; t=t OFS $5}END{print s t}' OFS=\| infile

If the records are not sorted, then you can sort first:

sort -t\| -k1,3n infile | 
awk -F\| 'p!=$1 OFS $2{if(s)print s t; t=x; s=p=$1 OFS $2}{s=s OFS $4; t=t OFS $5}END{print s t}' OFS=\|

@Scruti

Ok now i am trying to understand ... :smiley:

1) is s=p=$1 equivalent to p=$1; s=p
or is it equivalent to s=p ; p=$1 ?

2) I don't get why the p is mandatory : i have tried to run the following :

# nawk -F\| 's!=$1 OFS $2{if(s)print s t; t=x; s=$1 OFS $2}{s=s OFS $4; t=t OFS $5}END{print s t}' OFS=\| infile
9991|80100001|0389999|1
9991|80100001|7070007|2
9991|80100001|4912100|1
9991|80100001|5990000|2
9991|80100001|2500200|1
9991|80100001|41401|2
9991|80100001|515|1
9991|80100001|4019|2
9991|80100001|56210|1

... but i get a multi line answer instead of 1 line answer ...
what do i miss ?

Thx in advance for your lights

--- The still awk Noobzor, but keeping up trying :smiley: ---

Hi ctsgnb,

s=p=$1 OFS $2 is the same as p=s=$1 OFS $2 . Afterwards both s and p are equal to $1 OFS $2, after the first line this is "9991|90100001"

But where p stays the same, s gets appended with $4's and goes on to grow into the first part of the line to be printed. The second part is t which gets appended with the $5's.

Your version would have worked if you used s!~"^"$1 OFS $2 instead of s!=$1 OFS $2

1 Like

" s=p=$1 OFS $2 is the same as s=p=$1 OFS $2 " really ??? :wink:

With the s=p=$1 i was just wondering if the affectation sequence was processed from right to left order but ok, this is the case so it just equivalent to p=$1 ; s=p so at the end s and p have the same value which in fact quite intuitive regarding to the notation, but i wasn't sure.

Because if the affectation would have been processed from left to right , this would have meant a rotation so that it could have been used for shifting field to the left.

But , ok i got it now ... Thx

Oops.. corrected it.. to p=s... :slight_smile:

Scrutinizer / ctsgnb

Thx you for responding to my request. I'm a novice awk programmer.
If you have the time, could you explain how your solution to the request works? Also how would option two work? Where the output needs 30 work code fields for every employee? If the input record has only ten valid work code fields. but I need to output the ten valid work code fields plus twenty more so that this employee has thirty work code fields in the output file. The invalid fields would be blank.
Again thx you for your assistance.

Scrutinizer,

THX again for your help. I've been out sick for the past days.
I'll try your solution for option 2, and if I have any questions I will get back to you.
THX

In my suggestion I used FS=\| and OFS=\| (not FS=| and OFS=| ) and also $0=$0 and not $0-$0 and I get:

1586|80100001|0389|70707|49121|5990|25002|41401|515|4019|56210|56999|||||||||||||||||||||||||||||||1|2|1|2|1|1|2|2|2|1||||||||||||||||||||||||||||||

Don't you?

Hi Scrutinizer

Yes, I'm getting that results,

for the work code all I should have is a total of 30 fields, the results you are getting is a total of 40 fields.

1586|80100001|0389|70707|49121|5990|25002|41401|515|4019|56210|56999|||||||||||||||||||||||||||||||1|2|3|4|5|6|7|8|9|1|||||||||||||||
|||||||||||||||

The output should be:

 
1586|80100001|0389|70707|49121|5990|25002|41401|515|4019|56210|56999|||||||||||||||||||||1|2|1|2|1|1|2|2|2|1||||||||||||||||
|||||

Same with the pri code fields there should only be a total of 30 fields.

Do I still have some thing wrong?

THX, gator5242

Just replace the 30's with 20's in my solution, or experiment with the right number.

Hi Scrutinizer

All input records will not have the same number of work codes or pri codes.
One input record might have one input work code, then I would need 29 blank field for the output record.
The input number of work codes and pri codes will be different for each employee.
The number of work codes and pri code could also be different for the same employee.
What do I need to change to have it work this way?
THX again for your assisance.

How about this:

awk -F\| 'function pr(){$0=s;NF+=30-m;$0=$0 t;NF+=30-n;print} 
          p!=$1 OFS $2{if(s)pr(); t=x; s=p=$1 OFS $2;m=n=0}
          $4""{s=s OFS $4;m++}
          $5""{t=t OFS $5;n++}
          END{pr()}' OFS=\| infile

Scrutinizer,

The work codes are working there is 30 output fields, but I'm only getting 29 output fields for the pri codes.
What do I need to change?
THX

I would suggest you experiment a little. Change the first 30 to 31 for example and see what it does...

Scrutinizer,

Thanks, changing the second 30 to 31 is working. I will do some more testing now and see what happen.
Thank you, gator5242