I have a short line of code that checks very rudimentary for duplicate code:
sort myfile.cpp | uniq -c | grep -v "^.*1 " | grep -v "}"
It sorts the file, counts occurrences of each line, removes single occurrences and removes the ubiquitous closing brace. The language is C++, but is easily extensible to other programming languages.
I would like to make this a bit more advanced. A few examples:
1- Allow for spaces, so that the following lines of output are considered identical:
2 for (i = 0; i < N; i++) {
2 for (i = 0; i < N; i++) {
2- Allow for spaces within the code, so that the following lines of output are considered identical:
2 for (i = 0; i < N; i++) {
2 for ( i = 0; i < N; i++ ) {
If there are easy ways to fix this, I like to hear from you.
I am deliberately not excluding lines of comment, such as those containing "/" or "/" or "//", as this would reduce the case to tell developers to document their code better.
Any other one-liner ideas to check for duplicate code are also welcome.