$ cat someStrings.txt
GRANT select on MANHPRD.S_PROD_INT TO OR_PHIL;
GRANT select on MANHPRD.S_PROD_INT TO OR_PHIL;
GRANT select on SCOTT.emp to JOHN;
grant select on scott.emp to john;
grant select on scott.dept to hr;
If you ignore the case and the empty space between the characters , there are only 3 distinct lines in the above .txt file and they are
### Distinct output
GRANT select on MANHPRD.S_PROD_INT TO OR_PHIL;
GRANT select on SCOTT.emp to JOHN;
grant select on scott.dept to hr;
How can I remove the duplicate lines after ignoring the case and the empty space between the characters and get the above mentioned distinct output ?
---------- Post updated at 11:41 ---------- Previous update was at 11:40 ----------
Howsoever, try
awk '
{(gsub(/ +/," "))}
!T[toupper($0)]++
' file
GRANT select on MANHPRD.S_PROD_INT TO OR_PHIL;
GRANT select on SCOTT.emp to JOHN;
grant select on scott.dept to hr;
Thank you very much Rudic. Your command works (although I didn't understand anything in it ).
Need to do some googling on the basics of awk.
It can be put in one line as well as shown below . Right ?
# awk '{(gsub(/ +/," "))}!T[toupper($0)]++' somestrings.txt
GRANT select on MANHPRD.S_PROD_INT TO OR_PHIL;
GRANT select on SCOTT.emp to JOHN;
grant select on scott.dept to hr;
cache={}
with open("a.txt") as file:
for line in file:
line=line.replace("\n","")
key=" ".join([i.lower() for i in filter(lambda x: x!="",line.split(" "))])
if key not in cache:
print(key)
cache[key]=1
Following is the explanation for command mentioned by RudiC sir.
awk '
{(gsub(/ +/," "))} ##### gsub is used for substitute operation, like here we are replacing the spaces which are unequal to a single spaces, like in row number 2 you have showed us in input space is NOt a single space. So that we can make equal length in between fields of each line.
!T[toupper($0)]++ ##### toupper is a utility by which we can covert any string/line to completly capital form. Here we are creating an array named T whose index is the complete line which has been changed toupper cases now, !T[toupper($0)]++ means if the line haven't occur even a single time than make that specfici line's count as 1 and ! sign before aray T makes sure no lines should have count more than 1, so that we can have unique single time lines only. As we know awk works on
condition and action format, means if any condition is RUE then action mentioned next to it should be perfoemed, here when any lines comes first time into array T then it will print it too because we haven't given any action and default action in awk is to print.
' file ##### mentioning input file name here