Shell script to get duplicate string

Hi All,

I have a requirement where I have to get the duplicate string count and uniq error message. Below is my file:

 Rejected - Error on table TableA, column ColA.
Error String 1.
 Rejected - Error on table TableA, column ColB.
Error String 2.
 Rejected - Error on table TableA, column ColB.
Error String 2.
 Rejected - Error on table TableA, column ColA.
Error String 1.
 Rejected - Error on table TableA, column ColA.
Error String 1.

Here I need to get the 1st line with table and column detail and second line for the error String. There are hundreds of the same error like this in my file I need the count of the error for a particular error on table.column and the unique {error message} with table.column details.

It should come something like:

 Rejected - Error on table TableA, column ColA.
Error String 1.                                                   3 Rows
 Rejected - Error on table TableA, column ColB.
Error String 2.                                                   2 Rows

Thanks in advance

awk '{x=$0; getline; x=x"\n"$0; a[x]++}END{for (i in a){print i, a, "rows"}}' file
1 Like

Thanks Balajesuri. The code is working perfect.
If you don't mind please explain the code as I am new in AWK.:slight_smile:

Hello Deekhari,

Following may help you in same.

awk '{x=$0;            ###### Taking complete line into a variable named x
getline;               ###### getline is an awk keyword to move the cursor to next line.
x=x"\n"$0;             ###### taking value of variable named x with it's previous value and then new line and then complete line($0).
a[x]++}                ###### Here creating an array named a whose index is x and incrementing it's each occurances too.
END{                   ###### starting END section here.
for (i in a){          ###### starting a for loop in array named a
print i, a, "rows"} ###### here once we get into array a then printing the value of vafriable i, value of array a as a(which is nothing but count of rows) and string rows then. 
}' file                ###### calling Input_file here.

Let me know if you have any queries on same.

R. Singh

1 Like

Thanks a lot Ravinder. One more question.
My file is a sqlldr log in which many other msg and content is there.
Actually I was getting the exact msg which contain only error by below command(that msg content is in my first question)

grep -A 1 "Rejected - Error" $FILE_NAME | cut -d':' -f2 

From this command I am getting my desired error messages which i was initially redirecting to a file and then the awk function will use it to give me error count with error msg.

Now I don't want to create a separate file and want to directly pass the strings to the awk function. How I can do that.

Thanks again for the help.

Try a small adaption to balajesuri's proposal:

awk '/^ Rejected/ {x=$0; getline; x=x"\n"$0; a[x]++}END{for (i in a){print i, a, "rows"}}' file
 Rejected - Error on table TableA, column ColB.
Error String 2. 2 rows
 Rejected - Error on table TableA, column ColA.
Error String 1. 3 rows
1 Like

Here's a version that does not require AWK

perl -nle 'if($id){$save{"$id\n$_"}++}; ($id)=/^( Rejected.*$)/; END{for(keys %save){print "$_ $save{$_} rows"}}' deekhari.log
 Rejected - Error on table TableA, column ColB.
Error String 2. 2 rows
 Rejected - Error on table TableA, column ColA.
Error String 1. 3 rows