Replace string and create new file multiple times

First of all, apologies if this has already been answered elsewhere. I haven't quite been able to find what I'm looking for yet, so hopefully this won't come across as repetition.

I have a file consisting of ~100 nearly identical lines, each of which contains multiple instances of the string I need to replace. Let's say the string is simply "001". Each instance of the string within a single line has a slightly different pre/suffix that can be used for search purposes. I know sed is capable of addressing this, but I'm not sure how to proceed with the following steps...

I need to create a new file where the 001 is replaced with 002, while everything else in the line remains unchanged. This will go on until, say, 300 - thus generating 300 different files, each of which needs to be saved separately.

So the first file contains a list like this:

-file1 001.txt -file2 blah1.txt -outputx x001blah1 -outputy y001blah1
-file1 001.txt -file2 blah2.txt -outputx x001blah2 -outputy y001blah2
-file1 001.txt -file2 blah3.txt -outputx x001blah3 -outputy y001blah3

...saved as list001.txt. I need to replace the three instances of 001 with 002 and save that file as list002.txt, which would look like this:

-file1 002.txt -file2 blah1.txt -outputx x002blah1 -outputy y002blah1
-file1 002.txt -file2 blah2.txt -outputx x002blah2 -outputy y002blah2
-file1 002.txt -file2 blah3.txt -outputx x002blah3 -outputy y002blah3

And so on and so forth until list300.txt (this number is pre-defined).

If I were to do this manually, it would require searching and replacing over a thousand times. Is there a single script I can use to find, replace, and create all 300 files, without having to make and modify each one individually?

To be honest, these 300 lists sound like something that could be replaced with a single script, if you could tell us what you're actually trying to do.

Each line corresponds to an individual command for a program designed to compare two different files (e.g. 001/blah1, 001/blah2, 001/blah3, etc.). I can't stray from the basic format described in my first post or carry out multiple comparisons using a single command. Given that I have a few hundred files to compare, I'm looking for a quicker way to generate this list of commands.

I'll join all 300 files together later on using cat, but at this stage, I figured it might be easier (from a scripting standpoint) to create a new file after each replacement, rather than continually append a set of modified lines to the end of one very long file.

Hello pseudo.seppuku,

Welcome to forums. Could you please try following and let me know if this helps you.

for file in *.txt
do
     let "i = i + 1"
     awk -vI=$i '{gsub(/001/,"002",$0);VAL=sprintf("%s%02d",FILENAME,I);print >> VAL}' $file
done

Thanks,
R. Singh

1 Like

Thank you! That certainly did the trick. The only caveat is that the new file is called list001.txt01 instead of list002.txt.

Aside from the file-naming convention... is there a way to do this multiple times in a row, using a single script? In other words, i = i + (1..299) - each time saved as a new file?

Try

awk '
        {ARR[NR] = $0}
END     {for (i=1; i<=MAX; i++) {if (FN) close (FN)
                                 TCNT = sprintf ("%03d", i)
                                 FN = FILENAME TCNT ".txt" 
                                 for (j=1; j<=NR; j++)  {T = ARR[j]
                                                         gsub (/001/, TCNT, T)
                                                         print T > FN
                                                        }
                                }
        }
' MAX=3 file
cf *.txt
file001.txt:
-file1 001.txt -file2 blah1.txt -outputx x001blah1 -outputy y001blah1
-file1 001.txt -file2 blah2.txt -outputx x001blah2 -outputy y001blah2
-file1 001.txt -file2 blah3.txt -outputx x001blah3 -outputy y001blah3
file002.txt:
-file1 002.txt -file2 blah1.txt -outputx x002blah1 -outputy y002blah1
-file1 002.txt -file2 blah2.txt -outputx x002blah2 -outputy y002blah2
-file1 002.txt -file2 blah3.txt -outputx x002blah3 -outputy y002blah3
file003.txt:
-file1 003.txt -file2 blah1.txt -outputx x003blah1 -outputy y003blah1
-file1 003.txt -file2 blah2.txt -outputx x003blah2 -outputy y003blah2
-file1 003.txt -file2 blah3.txt -outputx x003blah3 -outputy y003blah3
1 Like

Hello pseudo.seppuku,

Could you please try following and let me know if this helps, not tested though.

for file in *.txt; do let "i = i + 1"; awk -vI=$i '{gsub(/001/,"002",$0);VAL=sprintf("%02d",I);print >> FILENAME VAL;}' $file; done

Thanks,
R. Singh

1 Like

EGADS. This worked beautifully!

One minor question while I try to work out all the syntax: the two sets of files for comparison (what I previously referred to as 001.txt and blah1.txt) actually have the same numbering system - since there are hundreds of them - but with a different letter prefix for each set, i.e., a001..a300.txt and b001..b300.txt. I only want the numbers starting with a to increase without changing b. Where should I specify/add this prefix?

In other words, the actual lines look more like this:

-file1 a001.txt -file2 b001.txt -outputx xa001b001 -outputy ya001b001
-file1 a001.txt -file2 b002.txt -outputx xa001b002 -outputy ya001b002
-file1 a001.txt -file2 b003.txt -outputx xa001bl003 -outputy ya001b003

As you can see, the current script would change the numbers for a (as intended) but also the corresponding number for b. This wouldn't be difficult to cross-check if I only had a few lines, but with hundreds, it becomes somewhat more tedious to correct manually. Sorry for the confusion! I was trying to keep my first post as simple as possible.

---------- Post updated at 07:01 PM ---------- Previous update was at 06:47 PM ----------

Just tried it. :slight_smile: The search and replace bit works perfectly. It only creates a single new file though, called file02. However, it's a little strange because if I delete this file and run the script again, the replacement is called file03 (instead of file02 again). Hope that makes sense.

Hello pseudo.seppuku,

That's because you haven't run it as a script and when you ran it as a command, variable named i 's value will be there in memory of shell and it will take it from there. When you save this as a script and run this will not happen. As for file names I am still little confuse as you need to show like current_file_name--> new_file_name etc, I hope this helps you.

Thanks,
R. Singh

Sorry, the first comment (regarding prefixes) was referring to RudiC's script.

I only just started fiddling with Linux recently so that's good to know.

However, what I meant about your script was that it only creates a single new file (file001 with string 001 -> file002 with string 002). I need this done a few hundred times, until file300 with string 300.

It generates a filename dynamically, based on the input filename. Or are you saying each input file creates 300 output files?

Not sure I get it - do you want the "a" numbers modified in the the file, or work on the file names starting with "a" only? I case of the former, try gsub (/a001/, "a" TCNT, T) , in case of the latter, try a*.txt for the file name parameter.