Shell script to get duplicate string

Hi All,

I have a requirement where I have to get the duplicate string count and uniq error message. Below is my file:

 Rejected - Error on table TableA, column ColA.
Error String 1.
 Rejected - Error on table TableA, column ColB.
Error String 2.
 Rejected - Error on table TableA, column ColB.
Error String 2.
 Rejected - Error on table TableA, column ColA.
Error String 1.
 Rejected - Error on table TableA, column ColA.
Error String 1.

Here I need to get the 1st line with table and column detail and second line for the error String. There are hundreds of the same error like this in my file ....so I need the count of the error for a particular error on table.column and the unique {error message} with table.column details.

It should come something like:

 Rejected - Error on table TableA, column ColA.
Error String 1.                                                   3 Rows
 Rejected - Error on table TableA, column ColB.
Error String 2.                                                   2 Rows

Thanks in advance

awk '{x=$0; getline; x=x"\n"$0; a[x]++}END{for (i in a){print i, a, "rows"}}' file
1 Like

Thanks Balajesuri. The code is working perfect.
If you don't mind please explain the code as I am new in AWK.:slight_smile:

Hello Deekhari,

Following may help you in same.

awk '{x=$0;            ###### Taking complete line into a variable named x
getline;               ###### getline is an awk keyword to move the cursor to next line.
x=x"\n"$0;             ###### taking value of variable named x with it's previous value and then new line and then complete line($0).
a[x]++}                ###### Here creating an array named a whose index is x and incrementing it's each occurances too.
END{                   ###### starting END section here.
for (i in a){          ###### starting a for loop in array named a
print i, a, "rows"} ###### here once we get into array a then printing the value of vafriable i, value of array a as a(which is nothing but count of rows) and string rows then. 
}' file                ###### calling Input_file here.
 

Let me know if you have any queries on same.

Thanks,
R. Singh

1 Like

Thanks a lot Ravinder. One more question.
My file is a sqlldr log in which many other msg and content is there.
Actually I was getting the exact msg which contain only error by below command(that msg content is in my first question)

grep -A 1 "Rejected - Error" $FILE_NAME | cut -d':' -f2 

From this command I am getting my desired error messages which i was initially redirecting to a file and then the awk function will use it to give me error count with error msg.

Now I don't want to create a separate file and want to directly pass the strings to the awk function. How I can do that.

Thanks again for the help.

Try a small adaption to balajesuri's proposal:

awk '/^ Rejected/ {x=$0; getline; x=x"\n"$0; a[x]++}END{for (i in a){print i, a, "rows"}}' file
 Rejected - Error on table TableA, column ColB.
Error String 2. 2 rows
 Rejected - Error on table TableA, column ColA.
Error String 1. 3 rows
1 Like

Here's a version that does not require AWK

perl -nle 'if($id){$save{"$id\n$_"}++}; ($id)=/^( Rejected.*$)/; END{for(keys %save){print "$_ $save{$_} rows"}}' deekhari.log
 Rejected - Error on table TableA, column ColB.
Error String 2. 2 rows
 Rejected - Error on table TableA, column ColA.
Error String 1. 3 rows