How to get Duplicate rows in a file

Hi all,

I have written one shell script. The output file of this script is having sql output.

In that file, I want to extract the rows which are having multiple entries(duplicate rows).
For example, the output file will be like the following way.

===============================================================
<SH12_MC30_CE_VS_NY_HIST_T>

397 44847
400 33653
401 46455

<SH12_MC30_CE_VS_NY_HIST_T_BKP>

397 44847
398 40107
399 39338
400 33653

      In this output, I want numeric duplicate rows only. Suppose this file is having lines to separate the values, those lines also considered as duplicate rows. So I want only the out put from this file which is having more than one entry and which is related to numbers.

Can anyone please tell me the command?
Thanks in advance.

Regards,
Raghu.:slight_smile:

cat file1 file2 | \
   grep -v -e '^='  -e '^<' | \
   awk '{ arr[$0]++} END{ for (i in arr) { if(arr>1) { print i}  }}' > newfile

cat the files into grep to remove filenames in grep output, grep removes the header lines

Try this

#!/bin/ksh
sort $1 > sortedfile
nawk '{ while (getline < sortedfile >0); array[n++]=$0; compare and remove non dup record here}'
 
 
 
 
 
 
 

Hi Jim,

I could understand till second line of ur command.
I couldn't understand the awk part. Becoz i dont know the awk features.
But it is working. Thank you very much for that. 'awk' is so nice.
Can you give any aother way to get it instead of awk.

Thanks & Regards,
Raghunadh.

nawk '/^[0-9]/ {a[$0]++} END {for (i in a) if (a>1) print i}' myOutputFile

Hi vgersh99,

Thank you very much for ur reply.
'nawk' command id nice. But I dont know the 'awk' functionalities. So if I put this command in my script then I cant explain this command to anyone. So can you please provide me the command instead of 'awk' and 'nawk'.

Thanks in advance,

Regards,
Raghu.

I used awk at end only to handle output format. This could be done with a cut command also, although extra care is necessary for positioning.

> cat file9
===============================================================
<SH12_MC30_CE_VS_NY_HIST_T>
===============================================================
397 44847
400 33653
401 46455
===============================================================
<SH12_MC30_CE_VS_NY_HIST_T_BKP>
===============================================================
397 44847
398 40107
399 39338
400 33653

> grep "^[0-9]" file9 | sort | uniq -cd
      2 397 44847
      2 400 33653

> grep "^[0-9]" file9 | sort | uniq -cd | awk '{print $2" "$3}'
397 44847
400 33653

and, if your really don't want awk

> grep "^[0-9]" file9 | sort | uniq -cd | tr -s " " | cut -d" " -f3-4
397 44847
400 33653

Added quicker way -->

> grep "^[0-9]" file9 | sort | uniq -d 
397 44847
400 33653

Hi joeyg,

Thank you very much.

Regards,
Raghu.