Friends,
I have .txt file with following format.
START
ABC|Prashant1|Patel1
ABC|Prashant2|Patel2
ABC|Prashant1|Patel1
ABC|Prashant2|Patel2
END
I would like to do:
1) Delete line with START
2) Delete line with END
3) Remove ABC|
4) Delete duplicate records
The following command works fine which deletes line with START and END
sed -e /^START/d -e /^END/d Filename.txt
How do I incorporate task 3 and 4?
NOTE: The file will have more than 500,000 thousand rows.
Thanks in advance for suggestion,
Prashant
nawk -F'|' 'NF==3{print $2,$3}' patel
This code is not doing uniq + removes pipe delimiter
use this
grep "|" test | sed 's/^ABC|//g' | sort -u
Tostay2003:
What if the data doesnt start with ABC your logic fails
use this
nawk -F'|' 'NF==3{print $2,$3}' patel | sort -u
I assumed from the authors description that the first field remains common in the file
Use this if you didnt mean that the first field woudl be same.
grep "|" test | cut -d'|' -f2,3 | sort -u
or with slight amendment to code written by zenith i.e. by adding OFS
nawk -F'|' 'NF==3 && !a[$2,$3]++ {print $2,$3}' patel
Thank you all for your reply.
nawk -F'|' 'NF==3 && !a[$2,$3]++ {print $2,$3}' patel
I had made small change in print statment because it was not printing | symbol.
nawk -F'|' 'NF==3 && !a[$2,$3]++ {print $2,"|",$3}' patel
However, it prints the space after and before | symbol.
Thanks,
Prashant
nawk -F'|' 'NF==3 && !a[$2,$3]++ {print $2, $3}' OFS='|' patel