I tried this code but do not get my desired output
counter=0;
for file in `cat FILE1.txt | awk -F'[/_.]' '{print $3$4$5$6}'`
do
x=`echo "$file"`
while read eachline
do
y=`echo "$eachline" | cat temp.txt | awk -F'[/_.]' '{print $1$2$3$4}'`
if [ "$x"=="$y" ]
then
cp -v $file /home/imran/Script/data
counter=$((counter+1))
break
fi
done < FILE2.txt
echo $counter
done
I have tried in this way also
counter=0;
for f in `awk 'NR>2{print}' FILE1.txt`
do
f3=$(echo $f|awk -F'/' '{print $2}');
f6=$(echo "${f3%%.*}");
for g in `awk 'NR>=1{print}' FILE2.txt`
do
if [ "$f"=="$g" ]
then
cp $f /home/imran/Script/data
counter=$((counter+1))
break;
fi
done
echo $counter
done
Could you please let me know if following may help you here.
awk 'NR == FNR #### NR and FNR are the awk's inbuilt variables so condition NR==FNR willbe TRUE only when first file(file2) here will be read. Because FNR's value will be reset whenever a new file is being read but NR's value will be keep on increasing till the all files will be completed reading.
{T[$1]; #### creating an array named T whose value is $1(first field).
next} #### putting next(awk's inbuilt keyword) to skip all further statements now.
#### All following statements will be read when second file named file1 is being read.
{FN = $0; #### creating a variable named FN whose value is $0(complete line).
gsub (/^.*\/|.txt$/, _)} #### gsub(awk's in-built functionality to globally subtitute the pattern in any line or variable, line here in this case. It will globally subsitutue everything till / (as per your requirement) with NULL.
$0 in T #### Now every line(which is formed by above subsitute command now) is present in array named T(which was created while file2 was getting read in NR==FNR condition).
{system ("echo cp " FN " /some/where") #### using system command(which is use to execute shell commands inside awk) executing echo command which will write the actually commands which we want to perform like cp source_file Target_file in this case.
}' file2 file1 #### Mentioning Input_files named file2 and file1 here.
awk '
NR == FNR {T[$1] # for the first file (NR id. to FNR), collect the names to search in T array
next # stop processing this line; read next one
}
{FN = $0 # second file only: save total file path in FN variable
gsub (/^.*\/|.txt$/, _) # remove leading path info and ".txt" ext. from file name
}
$0 in T {system ("echo cp " FN " /some/where") # IF the reduced file name is found in pattern array T, run the
# system command to cp FN (full file path) to destination (echo inserted for safety)
}
' file2 file1
Each call to system() in awk will invoke a shell which will then invoke cp . If there are 6000 files to be copied, invoking one shell for the copies instead of 6000 should be considerably faster. Consider this small change to RudiC's suggestion:
awk '
NR == FNR {T[$1]
next
}
{FN = $0
gsub (/^.*\/|.txt$/, _)
}
$0 in T {print "cp", FN, "/some/where"
}
' file2 file1 | sh
And, if the cp utility on your system has a -t destination_directory option (which is an extension not covered by the standards), you could make even more gains greatly reducing the number of times cp is invoked by using xargs :
awk '
NR == FNR {T[$1]
next
}
{FN = $0
gsub (/^.*\/|.txt$/, _)
}
$0 in T {print FN
}
' file2 file1 | xargs cp -t "/some/where"
I considered that as well. cp , at least some versions, allows to copy multiple input files to a target directory. That could be done like
awk '
BEGIN {printf "echo cp" # prepare shell statement
}
NR == FNR {T[$1] # for the first file (NR id. to FNR), collect the names to search in T array
next # stop processing this line; read next one
}
{FN = $0 # second file only: save total file path
gsub (/^.*\/|.txt$/, _) # remove leading path info and ".txt" ext. from file name
}
$0 in T {printf " %s ", FN # IF the reduced file name is found in pattern array T, print the FN (full file path)
}
END {print " /some/where" # finish shell statement
}
' file2 file1 | sh
cp ./5_April_2012_Page323.txt ./6_August_2012_Page328.txt ./10_February_2014_Sportz6.txt /some/where
It may overrun system limits if too many files are to be copied, though.
We could squeeze a few more source file operands into a cp command if we drop the leading ./ from the file operands:
$0 in T {printf " %s.txt", $0 # IF the reduced file name is found in pattern array T, print the filename
If imranrasheedamu tells us that cp -t target is not available and the above script fails with E2BIG errors, we could also make some simple modifications to the above script to put no more than x source file operands in each cp command where x is 50, 100, or some other conservative number based on the maximum filename length, the size of the combined environment variables, and ARG_MAX on the system. But I don't see any reason to spend the time to do that unless imranrasheedamu lets us know that it is needed.