Replace string and create new file multiple times

pseudo.seppuku · May 25, 2016, 12:17pm

First of all, apologies if this has already been answered elsewhere. I haven't quite been able to find what I'm looking for yet, so hopefully this won't come across as repetition.

I have a file consisting of ~100 nearly identical lines, each of which contains multiple instances of the string I need to replace. Let's say the string is simply "001". Each instance of the string within a single line has a slightly different pre/suffix that can be used for search purposes. I know sed is capable of addressing this, but I'm not sure how to proceed with the following steps...

I need to create a new file where the 001 is replaced with 002, while everything else in the line remains unchanged. This will go on until, say, 300 - thus generating 300 different files, each of which needs to be saved separately.

So the first file contains a list like this:

-file1 001.txt -file2 blah1.txt -outputx x001blah1 -outputy y001blah1
-file1 001.txt -file2 blah2.txt -outputx x001blah2 -outputy y001blah2
-file1 001.txt -file2 blah3.txt -outputx x001blah3 -outputy y001blah3

...saved as list001.txt. I need to replace the three instances of 001 with 002 and save that file as list002.txt, which would look like this:

-file1 002.txt -file2 blah1.txt -outputx x002blah1 -outputy y002blah1
-file1 002.txt -file2 blah2.txt -outputx x002blah2 -outputy y002blah2
-file1 002.txt -file2 blah3.txt -outputx x002blah3 -outputy y002blah3

And so on and so forth until list300.txt (this number is pre-defined).

If I were to do this manually, it would require searching and replacing over a thousand times. Is there a single script I can use to find, replace, and create all 300 files, without having to make and modify each one individually?

Corona688 · May 25, 2016, 12:26pm

To be honest, these 300 lists sound like something that could be replaced with a single script, if you could tell us what you're actually trying to do.

pseudo.seppuku · May 25, 2016, 12:45pm

Each line corresponds to an individual command for a program designed to compare two different files (e.g. 001/blah1, 001/blah2, 001/blah3, etc.). I can't stray from the basic format described in my first post or carry out multiple comparisons using a single command. Given that I have a few hundred files to compare, I'm looking for a quicker way to generate this list of commands.

I'll join all 300 files together later on using cat, but at this stage, I figured it might be easier (from a scripting standpoint) to create a new file after each replacement, rather than continually append a set of modified lines to the end of one very long file.

RavinderSingh13 · May 25, 2016, 12:51pm

Hello pseudo.seppuku,

Welcome to forums. Could you please try following and let me know if this helps you.

for file in *.txt
do
     let "i = i + 1"
     awk -vI=$i '{gsub(/001/,"002",$0);VAL=sprintf("%s%02d",FILENAME,I);print >> VAL}' $file
done

Thanks,
R. Singh

pseudo.seppuku · May 25, 2016, 1:11pm

Thank you! That certainly did the trick. The only caveat is that the new file is called list001.txt01 instead of list002.txt.

Aside from the file-naming convention... is there a way to do this multiple times in a row, using a single script? In other words, i = i + (1..299) - each time saved as a new file?

RudiC · May 25, 2016, 1:11pm

Try

awk '
        {ARR[NR] = $0}
END     {for (i=1; i<=MAX; i++) {if (FN) close (FN)
                                 TCNT = sprintf ("%03d", i)
                                 FN = FILENAME TCNT ".txt" 
                                 for (j=1; j<=NR; j++)  {T = ARR[j]
                                                         gsub (/001/, TCNT, T)
                                                         print T > FN
                                                        }
                                }
        }
' MAX=3 file
cf *.txt
file001.txt:
-file1 001.txt -file2 blah1.txt -outputx x001blah1 -outputy y001blah1
-file1 001.txt -file2 blah2.txt -outputx x001blah2 -outputy y001blah2
-file1 001.txt -file2 blah3.txt -outputx x001blah3 -outputy y001blah3
file002.txt:
-file1 002.txt -file2 blah1.txt -outputx x002blah1 -outputy y002blah1
-file1 002.txt -file2 blah2.txt -outputx x002blah2 -outputy y002blah2
-file1 002.txt -file2 blah3.txt -outputx x002blah3 -outputy y002blah3
file003.txt:
-file1 003.txt -file2 blah1.txt -outputx x003blah1 -outputy y003blah1
-file1 003.txt -file2 blah2.txt -outputx x003blah2 -outputy y003blah2
-file1 003.txt -file2 blah3.txt -outputx x003blah3 -outputy y003blah3

RavinderSingh13 · May 25, 2016, 1:31pm

Hello pseudo.seppuku,

Could you please try following and let me know if this helps, not tested though.

for file in *.txt; do let "i = i + 1"; awk -vI=$i '{gsub(/001/,"002",$0);VAL=sprintf("%02d",I);print >> FILENAME VAL;}' $file; done

Thanks,
R. Singh

pseudo.seppuku · May 25, 2016, 2:01pm

rudic:

Try

awk '
   {ARR[NR] = $0}
END     {for (i=1; i<=MAX; i++) {if (FN) close (FN)
   TCNT = sprintf ("%03d", i)
   FN = FILENAME TCNT ".txt" 
   for (j=1; j<=NR; j++)  {T = ARR[j]
   gsub (/001/, TCNT, T)
   print T > FN
   }
   }
   }
' MAX=3 file
cf *.txt
file001.txt:
-file1 001.txt -file2 blah1.txt -outputx x001blah1 -outputy y001blah1
-file1 001.txt -file2 blah2.txt -outputx x001blah2 -outputy y001blah2
-file1 001.txt -file2 blah3.txt -outputx x001blah3 -outputy y001blah3
file002.txt:
-file1 002.txt -file2 blah1.txt -outputx x002blah1 -outputy y002blah1
-file1 002.txt -file2 blah2.txt -outputx x002blah2 -outputy y002blah2
-file1 002.txt -file2 blah3.txt -outputx x002blah3 -outputy y002blah3
file003.txt:
-file1 003.txt -file2 blah1.txt -outputx x003blah1 -outputy y003blah1
-file1 003.txt -file2 blah2.txt -outputx x003blah2 -outputy y003blah2
-file1 003.txt -file2 blah3.txt -outputx x003blah3 -outputy y003blah3

EGADS. This worked beautifully!

One minor question while I try to work out all the syntax: the two sets of files for comparison (what I previously referred to as 001.txt and blah1.txt) actually have the same numbering system - since there are hundreds of them - but with a different letter prefix for each set, i.e., a001..a300.txt and b001..b300.txt. I only want the numbers starting with a to increase without changing b. Where should I specify/add this prefix?

In other words, the actual lines look more like this:

-file1 a001.txt -file2 b001.txt -outputx xa001b001 -outputy ya001b001
-file1 a001.txt -file2 b002.txt -outputx xa001b002 -outputy ya001b002
-file1 a001.txt -file2 b003.txt -outputx xa001bl003 -outputy ya001b003

As you can see, the current script would change the numbers for a (as intended) but also the corresponding number for b. This wouldn't be difficult to cross-check if I only had a few lines, but with hundreds, it becomes somewhat more tedious to correct manually. Sorry for the confusion! I was trying to keep my first post as simple as possible.

---------- Post updated at 07:01 PM ---------- Previous update was at 06:47 PM ----------

ravindersingh13:

Hello pseudo.seppuku,

Could you please try following and let me know if this helps, not tested though.
for file in *.txt; do let "i = i + 1"; awk -vI=$i '{gsub(/001/,"002",$0);VAL=sprintf("%02d",I);print >> FILENAME VAL;}' $file; done
Thanks,
R. Singh

Just tried it. The search and replace bit works perfectly. It only creates a single new file though, called file02. However, it's a little strange because if I delete this file and run the script again, the replacement is called file03 (instead of file02 again). Hope that makes sense.

RavinderSingh13 · May 25, 2016, 2:05pm

pseudo.seppuku:

EGADS. This worked beautifully!

One minor question while I try to work out all the syntax: the two sets of files for comparison (what I previously referred to as 001.txt and blah1.txt) actually have the same numbering system - since there are hundreds of them - but with a different letter prefix for each set, i.e., a001..a300.txt and b001..b300.txt. I only want the numbers starting with a to increase without changing b. Where should I specify/add this prefix?

Sorry for the confusion!

---------- Post updated at 07:01 PM ---------- Previous update was at 06:47 PM ----------

Just tried it. The search and replace bit works perfectly. It only creates a single new file though, called file02. However, it's a little strange because if I delete this file and run the script again, the replacement is called file03 (instead of file02 again). Hope that makes sense.

Hello pseudo.seppuku,

That's because you haven't run it as a script and when you ran it as a command, variable named i 's value will be there in memory of shell and it will take it from there. When you save this as a script and run this will not happen. As for file names I am still little confuse as you need to show like current_file_name--> new_file_name etc, I hope this helps you.

Thanks,
R. Singh

pseudo.seppuku · May 25, 2016, 2:19pm

Sorry, the first comment (regarding prefixes) was referring to RudiC's script.

I only just started fiddling with Linux recently so that's good to know.

However, what I meant about your script was that it only creates a single new file (file001 with string 001 -> file002 with string 002). I need this done a few hundred times, until file300 with string 300.

Corona688 · May 25, 2016, 3:27pm

It generates a filename dynamically, based on the input filename. Or are you saying each input file creates 300 output files?

RudiC · May 25, 2016, 4:28pm

pseudo.seppuku:

.
.
.
a different letter prefix for each set, i.e., a001..a300.txt and b001..b300.txt. I only want the numbers starting with a to increase without changing b. Where should I specify/add this prefix?

In other words, the actual lines look more like this:
-file1 a001.txt -file2 b001.txt -outputx xa001b001 -outputy ya001b001
-file1 a001.txt -file2 b002.txt -outputx xa001b002 -outputy ya001b002
-file1 a001.txt -file2 b003.txt -outputx xa001bl003 -outputy ya001b003
As you can see, the current script would change the numbers for a (as intended) but also the corresponding number for b.
.
.
.

Not sure I get it - do you want the "a" numbers modified in the the file, or work on the file names starting with "a" only? I case of the former, try gsub (/a001/, "a" TCNT, T) , in case of the latter, try a*.txt for the file name parameter.