Performing 2 searches and 1 replace with sed

carya2 · September 29, 2023, 2:41pm

I am trying to change an input file that requires 2 searches per paragraph. The input file looks like this:

record1=sam,cn=users
attribute1=true
attribute2=false

record2=john,cn=users
attribute1=true
attribute2=false

The script would have to find the first record line, then change attribute2 from false to true, and then search for the next record and do the same. I was thinking I could put all the records in a file, and run that through a loop, but I'm not sure how to tell sed to search for the record, then search for the attribute within the same paragraph and change attribute2 from false to true. It is easy enough to change all references of attribute2 from false to true, but not every record needs to have attribute2 changed. Some records it should be true, and some it should be false. I only need to change the ones that need to be changed.
The file has about 5 million records and only a few hundred thousand need to have attribute2 changed.

Any help anyone can give me would be greatly appreciated.

vgersh99 · September 29, 2023, 2:54pm

@carya2, welcome to to the community!
Going forward, please use markdown code tags when posting data/code samples - the markdown code tags are described here.
I've edited your post for now.

Does it have to be sed? If so, why?
How do you know which records need modification and which don't?

carya2 · September 29, 2023, 3:35pm

No, it doesn't have to be sed. It's just what came to mind first. Yes, I have a list of which records need the modification and which don't. The file has a total of 5,463,329 records, and some of the attributes only need to be changed on about 105k records, some about 850k and some need it changed on every record. It just depends on what the attribute is. The value isn't always true/false. Sometimes it's a date stamp (which I do have available - it's a pair of the record name, the attribute to be changed, and the new value for the attribute).

munkeHoller · September 29, 2023, 4:23pm

@carya2 , hi, have you got any code that you've tried?
What is the control file look like for selecting candidates to be modified?

carya2 · September 29, 2023, 8:12pm

One thought could be to just delete the attribute and then add it back with the right value?
The delete is trivial so I will skip that. Once the deletes are done, to add them back in with the right values I could do something like:

while read line
do
    sed -i -e 's/^$line/a attribute1=true' input_file.txt
done < records.txt

but I need something to query another file with the correct values for attribute1 for the record rather than just blindly setting it to true. So I need to read one file that has the first line of the paragraph defined, and another with the correct value for the attribute.

AGG2020 · September 29, 2023, 9:17pm

What is the format of the second file that provide the rules? With so unclear requirements about the rules to modify each key value, you will not be able to receive any practical help.
If I had to do it, I could do it with Python with just a few lines of code, even though it can be done the same way with a Bash script.

MadeInGermany · September 30, 2023, 12:05am

Say your records.txt has

record2=attribute1=true

Then you could do

while IFS="=" read record attrib value
do
    sed -i -e '/^'"$record"'/,/^$/!b' -e '/^'"$attrib"'=/ s/=.*/='"$value"'/' input_file.txt
done < records.txt

Explanations:
The read reeads the triplet into 3 variables.
The shell concatenates the 'constant strings' and the "$variables" to form the sed code.
The sed code detects the record block by a /^record2/,/^$/ (from record2 till an empty line) and rules out everything else, then searches ^attribute1= and substitutes the =.* match by =true

A better suited tool (than sed) could take the shell variables as command arguments, much cleaner!
Further, the tool might process the records.txt file and read all to memory, then process the target file in one stroke - much faster!

system · July 26, 2024, 12:06am

This topic was automatically closed 300 days after the last reply. New replies are no longer allowed.