Hello friends!
Each line of my input file has this format:
word<TAB>tag1<blankspace>lemma<TAB>tag2<blankspace>lemma ... <TAB>tag3<blankspace>lemma
Of this file I need to eliminate all the repeated tags (of the same word) in a line, as in the example here below, but conserving both (all) the lemmata related to that tag, by concatenating them with a �|� separator.
My INPUT (sample):
abecedaria ADJ abecedarius ADJ:abl abecedarius N:abl abecedaria N:abl abecedarium N:acc abecedaria N:acc abecedarium N:nom abecedaria N:nom abecedarium N:voc abecedaria N:voc abecedarium
abecedariabus N:abl abecedaria N:dat abecedaria
abhorruerimus V:IND abhorreo V:IND abhorresco V:SUB abhorreo V:SUB abhorresco
abhorrueritis V:IND abhorreo V:IND abhorresco V:SUB abhorreo V:SUB abhorresco
abhorruero V:IND abhorreo V:IND abhorresco
Desired OUTPUT:
abecedaria ADJ abecedarius ADJ:abl abecedarius N:abl abecedaria|abecedarium N:acc abecedaria|abecedarium N:nom abecedaria|abecedarium N:voc abecedaria|abecedarium
abecedariabus N:abl abecedaria N:dat abecedaria
abhorruerimus V:IND abhorreo|abhorresco V:SUB abhorreo|abhorresco
abhorrueritis V:IND abhorreo |abhorresco V:SUB abhorreo|abhorresco
abhorruero V:IND abhorreo|abhorresco
Very gratefull to anyone who can help me!
mjomba from Tanzania