Deleting repeated strings in column 2

cgkmal · May 24, 2009, 6:09pm

Hi to all,

I have a file where the subject could contain "Summarized Availability Report" or only "Summarized Report"
If the subject is "Summarized Availability Report" I want to apply it Scrip1 and if the subject is "Summarized Report"
I want to apply it Scrip2.

1-) I would like you help me how to choose Script1 if Subject contains "Summarized Availability Report".
2-) To develop part of this Script1.

The Inputfile in $2 has strings with 2 or 3 between "_M" and "X-Z".

 
 
Inputfile example when Subject contain the string "Availability":
Subject: Summarized Availability Report
Comment             GHH_M55X            May 21 2009 4:45PM 
Comment             GHH_M55Y            May 21 2009 4:45PM
Comment             GHH_M55Z            May 21 2009 4:45PM
Comment             YUP_M19Y            May 18 2009 7:45PM
Comment             YUP_M19Y            May 18 2009 7:45PM
Comment             WON_M123X           May 17 2009 11:22AM
Comment             CET_M123X           May 15 2009 9:12AM

Desired output:
(Script1_part 1: After line containing "Subject:...", delete last letter of strings in $2)
(With my knowledge I got this

awk -F"[X-Z] " '/M[0-9][0-9]|[0-9][X-Z]/ {print $1" "$2}')

 
Subject: Summarized Report
 
Comment             GHH_M55            May 21 2009 4:45PM 
Comment             GHH_M55            May 21 2009 4:45PM 
Comment             GHH_M55            May 21 2009 4:45PM 
Comment             YUP_M19            May 18 2009 7:45PM 
Comment             YUP_M19            May 18 2009 7:45PM 
Comment             WON_M123           May 17 2009 11:22AM
Comment             CET_M123           May 15 2009 9:12AM

(Scrip1_part 2: After line containing "Subject:...", delete lines with repeated elements in $2)
(In this part I need help, I don�t know how to eliminate repeated strings in column 2 )

Subject: Summarized Report
Comment             GHH_M55            May 21 2009 4:45PM 
Comment             YUP_M19            May 18 2009 7:45PM
Comment             WON_M123           May 17 2009 11:22AM
Comment             CET_M123           May 15 2009 9:12AM

(Script1_part 3: After line containing "Subject:...", delete $1 and join lines with their Subject line)

 
Last lasta result 
Subject: Summarized Report->GHH_M55 May 21 2009 4:45PM, YUP_M19 May 18 2009 7:45PM, WON_M123 May 17 2009 11:22AM, CET_M123 May 15 2009 9:12AM

Thanks in advance for any help

panyam · May 25, 2009, 2:30am

To Remove the repeated lines and to print one copy .

awk '/^Comment/ { print $1,substr($2,1,length($2)-1),$3,$4,$5,$6 }' inputfile.txt | uniq -ud

devtakh · May 25, 2009, 12:55pm

awk 'NR==1{printf("%s-->",$0)}/^Comment/{a[$2]=$2" "$3" "$4" "$5" "$6}END{for (i in a) printf("%s%s", a,OFS)}' OFS="," filename

-Devaraj Takhellambam

cgkmal · May 26, 2009, 1:27am

Hey guys, thanks for your help. I tested both solutions, but I would like to
do a mix between them.

For panyam solution I get unique lines but not joined like

 
Subject: Summarized Report->GHH_M55 May 21 2009 4:45PM, YUP_M19 May 18 2009 7:45PM, WON_M123 May 17 2009 11:22AM, CET_M123 May 15 2009 9:12AM

and for devtakh solution I get the solution like a joined sentence, but including repeated items.

I replace in your code the part

a[$2]=$2" "$3...

to

a[$2]=substr($2,1,length($2)-1)" "$3...

But from here I�m not sure how to present uniques lines in a joined sentence.

One more thing:

Assuming I have 2 scripts how to choose Script1 if "Subject" contains "Summarized Availability Report" within?

Thanks again,

Best regards

panyam · May 26, 2009, 2:36am

But from here I�m not sure how to present uniques lines in a joined sentence.

use

"uniq -ud"

to get a single copy of the repeated lines.

Assuming I have 2 scripts how to choose Script1 if "Subject" contains "Summarized Availability Report" within?

that can be done by conditional checking.

vidyadhar85 · May 26, 2009, 3:36am

try devtakh's solution with small modification

awk 'NR==1{printf("%s-->",$0)}/^Comment/{a[substr($2,1,length($2)-1)]=$2" "$3" "$4" "$5" "$6}END{for (i in a) printf("%s%s", a,OFS)}' OFS="," filename