Deleting repeated lines by keeping only one.

anushree.a · September 25, 2012, 5:52am

Dear Buddies,
Need ur help once again.
I have a flat file with around 20 million lines (Huge file it is). However, many of the lines are of no use hence I want to remove it. To find and delete such lines we have certain codes written at the starting of each line. Basis that we can delete the lines. However its not as easy as it is looking.
Following is the example.

consumer_list.temp

Name: Anushree Aggarwal 
Tel1: 022-42158473
Tel2: 9965821475
Add1: Blah blah blah blah
Add2: Blah blah
Add3: Blah blah blah
Gndr: Female
Name: Rucha Chheda 
Tel1: 022-42158499
Tel2: 8325698501
Add1: Blah blah  
Add2: Blah blah blah
Add3: Blah blah blah blah
Gndr: Female
Name: Priyanka Rathi 
Tel1: 022-42158482
Tel2: 9658231492
Tel3: 021-23654125
Add1: Blah blah blah blah
Add2: Blah blah
Add3: Blah blah blah
Add4: Blah blah 
Gndr: Female

In above mentioned example multiple telephones and addresses are given. In output I want to take only one out of what ever number of telephones and addresses are provided.

Output should be as follows.

consumer_sorted_list.temp

Name: Anushree Aggarwal 
Tel1: 022-42158473
Add1: Blah blah blah blah
Gndr: Female
Name: Rucha Chheda 
Tel1: 022-42158499
Add1: Blah blah  
Gndr: Female
Name: Priyanka Rathi 
Tel1: 022-42158482
Add1: Blah blah blah blah
Gndr: Female

In short between "Name" and "Gndr" whatever information is provided it should appear only once.I am unable to think for a logic.
Need your help.
Thanks
Anu.

pamu · September 25, 2012, 6:02am

try this..

 awk '/^Name:/ || /^Tel1:/ || /^Add1:/ || /^Gndr:/' file

EDIT:

after looking at bmk's solution - same as egrep...

 grep -E "Name|Tel1|Add1|Gndr"  file

bmk · September 25, 2012, 6:12am

Try like...

grep -r "Name\|Tel1\|Add1\|Gndr" test1.txt

---------- Post updated at 05:12 AM ---------- Previous update was at 05:08 AM ----------

it's same as

egrep 'Name:|Tel1:|Add1:|Gndr:' test1.txt