Removing duplicate files from list with different path

I have a list which contains all the jar files shipped with the product I am involved with. Now, in this list I have some jar files which appear again and again. But these jar files are present in different folders.
My input file looks like this

/path/1/to a.jar
/path/2/to a.jar
/path/1/to b.jar
/path/1/to c.jar
/path/1/to d.jar
/path/2/to c.jar

Now I need to remove the duplicate entries i.e. remove the extra a.jar, c.jar and others like wise.

Final list would be like this:

/path/1/to a.jar
/path/1/to b.jar
/path/2/to c.jar
/path/1/to d.jar

This is the script I have so far..

#! /bin/sh
cp jar.txt jarnew.txt
for file in $(cat jar.txt)
do
        FILE1=`basename $file`
        for dup in $(cat jar.txt)
        do
        FILE2=`basename $dup`
        if [ "$file" != "$dup" -a "$FILE1" == "$FILE2" ] ; then
        sed -e '/($dup)/ d' <jarnew.txt >jarnew.txt.tmp
        echo "$FILE1 $FILE2"
        mv jarnew.txt.tmp jarnew.txt
        fi;
        done
done

But with this, the main functionality of the script is not working. i.e. the if condition is not working as required. Could be the sed problem or the logic.

I am getting the jarnew.txt as good as the jar.txt

Any pointers on how to proceed ?

Vino

Try with this script as,

#!/bin/sh
> final.jar
while read line; do

FILE=`basename $line`;
DIR=`echo $line | awk '{ print $(NF-1) }'`;
if [[ $FILE == "c.jar" && $DIR == "2" ]]
then
echo $line >> final.jar
elif [[ $DIR == "1" && $FILE != "c.jar" ]]
echo $line >> final.jar
fi

done < jar.txt

You will get result. Check it.

on the assumption that you only have filenames in the list ...

sort -t"/" -u +3 jar.txt > jarnew.txt

Muthu,

I dont intend to specify any particular jar file like in the way you have mentioned in
f [[ $FILE == "c.jar" && $DIR == "2" ]]

Rather, I would have it generalized.

How do you go about that ?

Vino

Generally scripts are written based on pattern change. In your requirement on input, only c.jar is taken from path/2/. I have simulated your input to required output.

And, your input and output is not being generallized so that script is given with using speicific filenames.

There are so many jar files. If I can collect these jar files manually, then I might as well do away with the script.

I need to get the jar files from the list, dynamically.

Vino

JustIce,

I dont think sort is a possible soution. The path length varies i.e. the directory structure is different for files. Some of them have a depth of 3.. others a depth of more than 3.

Vino

#! /bin/ksh

jarlist=`awk -F"/" '{print $NF}' jar.txt | sort -u`
for jarfile in $jarlist
do
    grep $jarfile jar.txt | uniq
done > jarnew.txt

exit 0

JustIce,

Just tried out your solution. It sorts the entries. And I can still see the duplicate entries.

Vino

try this one ...

#! /bin/ksh

jarlist=`awk -F"/" '{print $NF}' jar.txt | sort -u`
for jarfile in $jarlist
do
    grep $jarfile jar.txt | head -1
done > jarnew.txt

exit 0

JustIce..

That works perfect !

Thanks !