is it possible to remove all duplicate lines from all txt files in a specific folder?
This is too hard for me maybe someone could help.
lets say we have an amount of textfiles 1 or 2 or 3 or... maximum 50
each textfile has lines with text.
I want all lines of all textfiles together to be unique. but the not duplicate lines must remain in txt file where they are.
it does not matter, in what txt-file the dupicate lines are deleted, but one occurance has to stay in least one txt file... An even better solution would delete the duplicate occourances first in textfile 1 then in 2 then in 3, so that the amount of lines deleted are spread to all txt files.
example with 4 textfiles (amount can vary, up to 50) we also do not know how many lines.
Could be done with awk, but to simplify its work I believe it would be a good idea to first combine all the files in one file in such a way that all the original information is retained:
This way you have the original filename in the first column. The file can be sorted on the second colum, then you can apply an awk program that appends each field $2 as a line to a file named after field $1 but only if field $2 did not appear on the previous input line.
The delete operations would be automatically spread over the filenames.
.
.
.
.
.
.
.
.
.
.
.
.
.
Now you are wondering how to combine the files, sort the result, process it.
You should not use >> unless you want to preserve what was there in the file before the awk script runs. you should use a > operator.
Again, if there are many files, you should close them or elsem due to the OS limitation, you may find some errors.It is always a good idea to close them explicitly. use