Hi ,
I have a pipe seperated file repo.psv where i need to remove duplicates based on the 1st column only. Can anyone help with a Unix script ?
Input:
15277105||Common Stick|ESHR||Common Stock|CYRO AB
15277105||Common Stick|ESHR||Common Stock|CYRO AB
16111278||Common Stick|ESHR||Common Stock|STANDARD REGISTER CO
39693766||Common Stick|ESHR||Common Stock|HS AG
Output should be :
15277105||Common Stick|ESHR||Common Stock|CYRO AB
16111278||Common Stick|ESHR||Common Stock|STANDARD REGISTER CO
39693766||Common Stick|ESHR||Common Stock|HS AG
OP request pipe seperated file based on the 1st column only
The solution above using the space as separator not the pipe | 15277105||Common vs correct 15277105
Correct solution (setting field separator to pipe):
Hi,
My .psv file is getting bigger as more columns are being added. I remove duplicates based on the last column number and currently i know the position of this which is 176 . The column name is 'auditid'. Is there a way i can find the column number of this field and assign it to the array ? .
L is just an arbitrary Name for an array - call it what you like, Joe, Mimi, or L (short for logical). The $1 (first field of each respective row/line) is the index into that array, and that indexed element is incremented by ++. Any value except 0 (or empty, which is equivalent) will make the reference true, its inversion (by !) false. As the default action is print, the entire command reads: Get the array element for index $1. If it does not exist (=first occurrence of this index) invert to true, print. If it does (and has a value) , invert to false, don't print. Increment it for later print suppression.