Hi,
In a file, I have to mark duplicate records as 'D' and the latest record alone as 'C'.
In the below file, I have to identify if duplicate records are there or not based on Man_ID, Man_DT, Ship_ID and I have to mark the record with latest Ship_DT as "C" and other as "D" (I have to create a new field at the end of the records as "C" or "D")
File 1
====
Man_ID|Man_Dt|Ship_Id|Ship_Dt|ItemID|Noof ITEMS|ItemNam
001|2010-12-31|11|2010-12-31|111|2|Jackets
002|2010-12-31|12|2010-12-31|111|1|Caps
001|2010-12-31|11|2009-11-31|111|2|Jackets
001|2010-12-31|11|2011-12-31|111|2|Jackets
003|2010-11-01|13|2011-12-31|111|1|Shoes
Expected Output
File 1
=====
Man_ID|Man_Dt|Ship_Id|Ship_Dt|ItemID|Noof ITEMS|ItemNam
001|2010-12-31|11|2010-12-31|111|2|Jackets|D
002|2010-12-31|12|2010-12-31|111|1|Caps
001|2010-12-31|11|2009-11-31|111|2|Jackets|D
001|2010-12-31|11|2011-12-31|111|2|Jackets|C
003|2010-11-01|13|2011-12-31|111|1|Shoes
hi
you have provided the wrong input.
The output is not according to the input
awk '{s=$1 FS $2 FS $3}
NR==FNR{a++;b=FNR;next}
FNR==1{print;next}
{if (a<2)
{print}
else
{print (b==FNR)?$0 "|C":$0 "|D"}}' FS=\| OFS=\| infile infile
2 Likes
Thanks, rdcwayx!! Could please explain the code? It will be very helpful
hi rdcwayx:
your code does not work in this case:
Man_ID|Man_Dt|Ship_Id|Ship_Dt|ItemID|Noof ITEMS|ItemNam
001|2010-12-31|11|2010-12-31|111|2|Jackets
002|2010-12-31|12|2010-12-31|111|1|Caps
001|2010-12-31|11|2012-11-31|111|2|Jackets
001|2010-12-31|11|2011-12-31|111|2|Jackets
003|2010-11-01|13|2011-12-31|111|1|Shoes
---------- Post updated at 11:38 AM ---------- Previous update was at 11:36 AM ----------
According to this data the output should be-:
Man_ID|Man_Dt|Ship_Id|Ship_Dt|ItemID|Noof ITEMS|ItemNam
001|2010-12-31|11|2010-12-31|111|2|Jackets|D
002|2010-12-31|12|2010-12-31|111|1|Caps
001|2010-12-31|11|2012-11-31|111|2|Jackets|C
001|2010-12-31|11|2011-12-31|111|2|Jackets|D
003|2010-11-01|13|2011-12-31|111|1|Shoes
---------- Post updated at 11:41 AM ---------- Previous update was at 11:38 AM ----------
and from your code the output is coming:
Man_ID|Man_Dt|Ship_Id|Ship_Dt|ItemID|Noof ITEMS|ItemNam
001|2010-12-31|11|2010-12-31|111|2|Jackets|D
002|2010-12-31|12|2010-12-31|111|1|Caps
001|2010-12-31|11|2012-11-31|111|2|Jackets|D
001|2010-12-31|11|2011-12-31|111|2|Jackets|C
003|2010-11-01|13|2011-12-31|111|1|Shoes
1 Like
oh!!! am sorry...I missed it...My bad. Yes the code works as what parthmittal says!
parthmittal2007:
hi rdcwayx:
your code does not work in this case:
Man_ID|Man_Dt|Ship_Id|Ship_Dt|ItemID|Noof ITEMS|ItemNam
001|2010-12-31|11|2010-12-31|111|2|Jackets
002|2010-12-31|12|2010-12-31|111|1|Caps
001|2010-12-31|11|2012-11-31|111|2|Jackets
001|2010-12-31|11|2011-12-31|111|2|Jackets
003|2010-11-01|13|2011-12-31|111|1|Shoes
---------- Post updated at 11:38 AM ---------- Previous update was at 11:36 AM ----------
According to this data the output should be-:
Man_ID|Man_Dt|Ship_Id|Ship_Dt|ItemID|Noof ITEMS|ItemNam
001|2010-12-31|11|2010-12-31|111|2|Jackets|D
002|2010-12-31|12|2010-12-31|111|1|Caps
001|2010-12-31|11|2012-11-31|111|2|Jackets|C
001|2010-12-31|11|2011-12-31|111|2|Jackets|D
003|2010-11-01|13|2011-12-31|111|1|Shoes
---------- Post updated at 11:41 AM ---------- Previous update was at 11:38 AM ----------
and from your code the output is coming:
Man_ID|Man_Dt|Ship_Id|Ship_Dt|ItemID|Noof ITEMS|ItemNam
001|2010-12-31|11|2010-12-31|111|2|Jackets|D
002|2010-12-31|12|2010-12-31|111|1|Caps
001|2010-12-31|11|2012-11-31|111|2|Jackets|D
001|2010-12-31|11|2011-12-31|111|2|Jackets|C
003|2010-11-01|13|2011-12-31|111|1|Shoes
Update the code to check the latest date.
awk '{s=$1 FS $2 FS $3}
NR==FNR{a++;d=$4;gsub(/-/,"",d);max=(max>d)?max:d;if (max==d) b=FNR;next}
FNR==1{print;next}
{if (a<2)
{print}
else
{print (b==FNR)?$0 "|C":$0 "|D"}}' FS=\| OFS=\| infile infile
1 Like