Removing string from CSV file by provide removal string from other file

ketanraut · August 25, 2017, 8:54am

What I need is to remove the text from Location_file.txt from each line matching all entries from Remove_location.txt

Location_file.txt

FlowPrePaid, h3nmg1cm2,Jamaica_MTAImageFileFlowPrePaid,h0nmg1cm1, Flow_BeatTest,FlowRockTest
FlowNewTest,FlowNewTest,h0nmg1cm1
PartiallySubscribed, grndustyc42,h0nmg1cm1,PartialSub_Feb9
FlowBeatTest,Flow_BeatTest,h0nmg1cm1
FlowJazzTest,FlowJazzTest, h10nmg1cm1copy
NodeMonitor,Node_Monitor,h0nmg1cm1
h10nmg1cm1copy,FlowUltimateTest,h0nmg1cm1,UltimateTest
FlowRockTest,FlowRockTest,h0nmg1cm1
FlowRaveTest,FlowRaveTest,h0nmg1cm1
FlowIgnitionTest,FlowIgnitionTest,h0nmg1cm1
FlowJazz, h3nmg1cm2,h0nmg1cm1, Flow_BeatTest,FlowRockTest
FlowAcceleratorTest,FlowAcceleratorTest, h3nmg1cm2

Remove_location.txt

h0nmg1cm1
grndustyc42
h10nmg1cm1copy
h3nmg1cm2

The code I have tried in recessive for loop is not working properly as for each value in Remove_location.txt, first iteration removes only first entry� h0nmg1cm1� from file Location_file.txt but for next value �grndustyc42� its considering the initial file without effect/removal of value from previous iterations. Thus output file will always have all previously cleared entries..!

So how I can push new edited file each time with removed entries from Remove_location.txt.

second part of script is Removing duplicate lines and changing , to | also making copy of all files from name of first filed of final.txt to other location.

#!/bin/bash
>location_removed_out.txt
while read line
do
                while read cmts
                do
                        CMTS_VAL=$(echo $line | awk '{gsub(/'$cmts,*'/,"")}1')
                done < Remove_location.txt
#       echo "line value is : $line"
#       echo "cmts_val is : $CMTS_VAL"
        echo $CMTS_VAL >> location_removed_out.txt
done < Location_file.txt

#Removing duplicate lnes and changing ,  to |
awk '!seen[$0]++' location_removed_out.txt | tr "," "|" > final.txt

#Searching for file name and making copy

cd /home/webapps/project1/folder1
for f in `less final.txt | awk -F| "{print $1}'`
do 
   		file=$(echo $f)
		if [ -f "$file" ]
		then
		echo "$file found."
		   cp -v "$f" /home/webapps/project1/"${f%.xml}"_$(date +%m%d%y).csv
		else
			echo "$file not found.moving to next file....!" >> file_copyLog.txt
		fi

done

So Out put file will be below file and copy of files from field1 of this file
final.txt:

FlowPrePaid|Jamaica_MTAImageFileFlowPrePaid|Flow_BeatTest|FlowRockTest
FlowNewTest|FlowNewTest
PartiallySubscribed |PartialSub_Feb9
FlowBeatTest|Flow_BeatTest
FlowJazzTest|FlowJazzTest
NodeMonitor|Node_Monitor
FlowUltimateTest|UltimateTest
FlowRockTest|FlowRockTest
FlowRaveTest|FlowRaveTest
FlowIgnitionTest|FlowIgnitionTest
FlowJazz|Flow_BeatTest|FlowRockTest
FlowAcceleratorTest|FlowAcceleratorTest

RudiC · August 25, 2017, 4:48pm

As you deploy awk anyhow several times in your script, a single pass awk script may come in handy? Your usage / processing of spaces within or at end-of line doesn't seem to be consistent, so some deviation from your desired output may have to be absolved:

awk -F, -vOFS="|" 'NR == FNR {T[$1]; next} {for (t in T) gsub (t ",*|, *$", _); $1=$1}1' file2 file1 
FlowPrePaid| Jamaica_MTAImageFileFlowPrePaid| Flow_BeatTest|FlowRockTest
FlowNewTest|FlowNewTest
PartiallySubscribed| PartialSub_Feb9
FlowBeatTest|Flow_BeatTest
FlowJazzTest|FlowJazzTest
NodeMonitor|Node_Monitor
FlowUltimateTest|UltimateTest
FlowRockTest|FlowRockTest
FlowRaveTest|FlowRaveTest
FlowIgnitionTest|FlowIgnitionTest
FlowJazz|  Flow_BeatTest|FlowRockTest
FlowAcceleratorTest|FlowAcceleratorTest

As for the second part of the script, I'm afraid I didn't fully understand what you're after...?

EDIT: As there are no duplicates in your sample, I had to create a few; for their removal make the script

awk -F, -vOFS="|" 'NR == FNR {T[$1]; next} {for (t in T) gsub (t ",*|, *$", _); $1=$1} !seen[$0]++' file2 file1

ketanraut · August 28, 2017, 5:44am

thanks, working perfect..!

can you give me bit explanation how it works..

awk -F, -vOFS="|" 'NR == FNR {T[$1]; next} {for (t in T) gsub (t ",*|, *$", _); $1=$1} !seen[$0]++' file2 file1

RudiC · August 28, 2017, 6:52am

awk -F,                                                 # set input  field separator to ","
 -vOFS="|"                                              # set output field separator to "|"
'NR == FNR                                              # if processing first file (file line No. == stream line No.)
                {T[$1]                                  # save "remove location" as index in an (empty) array
                 next                                   # stop processing script for this line; start over with next line
                }
                                                        # now in second file
                {for (t in T)                           # loop through T's indices (awk feature)
                                gsub (t ",*|, *$", _)   # replace index string plus evtl. comma, or trailing comma with
                                                        # empty string (unassigned variable "_")
                 $1 = $1                                # Replace all comma field separators with "|". man awk:  "Assignment
                                                        # to NF or to a field causes $0 to be reconstructed by concatenating
                                                        # the $i's  separated  by  OFS."
                }
!seen[$0]++                                             # print first occurrences of lines only (remove duplicates) 
' file2 file1