Need to log the sed command used to replace

tomj5141 · May 4, 2016, 2:44pm

In a shell script I am replacing the asterisks in a file:

sed "s/\*/"0"/g" /home/download/$COMPANY_CODE/file_new > /home/download/$COMPANY_CODE/file

I need to log which positions were replaced & position(01:20) from the line it was replaced in. I am not sure how to do so. Also, instead of replacing all of the asterisks in a file, how can I just replace the asterisks in the file of the columns I am sure need replaced:
position(1050:1059) & position(1249:1258)
And still log it?

Don_Cragun · May 4, 2016, 2:57pm

Are the character positions in the file you are processing numbered with the 1st character on a line being position 0 or position 1?

How big is your input file ( /home/download/$COMPANY_CODE/file_new )?

What operating system are you using?

What shell (including release/version number) are you using?

Please show us (or attach) a few sample input lines and the exact output(s) you want to produce from that input.

tomj5141 · May 4, 2016, 3:17pm

The first position used is 01.
File is 22,000 KB.

AIX server 1 6 00F7E54B4C00
  
      PID    TTY STAT  TIME COMMAND
 19399110 pts/53 A     0:00 -sh

the lines will contain characters, numbers, spaces all through position 1444. Occasionally position(1050:1059) & position(1249:1258) will contain asterisks "**********". I want to convert those to zeros "0000000000". I don't want to convert any other asterisks that may be in the file. These cause issues due to a sql loader trying to insert them into Oracle tables on a numeric field.

I cannot attach actual data.

Don_Cragun · May 4, 2016, 4:02pm

Sample input lines do not need to be actual data; they just need to provide representative data.

Are these text files (no null bytes and a <newline> character at the end of each line) or are they binary files (may contain null bytes, for example in floating point numeric constants, and record length is determined by byte count only instead of containing a line terminating character sequence)?

What codeset is used to encode data in these files? (Perhaps ASCII, ISO-8859-1, UTF-8, or EBCDIC?)

In your 1st post in this thread, you said:

If you want a log (in addition to an output file that is a modified version of the input file), exactly what do you want logged? Please show us a sample of what a log should look like for a particular set of representative sample input records.

Please help us help you. Don't make us guess at everything that will determine what tools can be used to process your data to get the results you want! Making us guess at everything wastes the time of anyone who wants to help you, is likely to get you suggestions that don't stand a chance of working with your real data, and makes it much less likely that you will get any suggestions that work in your environment.

tomj5141 · May 5, 2016, 10:12am

I have attached a sample document. I believe it is ASCII. I've attached it as txt but it is a dat file written in pervasive psql. Best way to view the file is with Ultra Edit.
I need to replace the "**********" with "0000000000" but only when it occurs in position(1050:1059) & position(1249:1258). The other asterisks can stay in the file. After replacing the asterisks this file gets loaded in to Oracle. I need to write to a log the position(01:20) from the line it was replaced in. So in this sample line 13 position 1249:1258 will get replaced. After replacing the asterisks, it needs to write "2y102330" to a log file. The shell script the replace command will be contained in is already creating a log file so maybe just an echo command to show which customers were updated. Echo "position(01:20)" output = "2y102330". I am not sure though.

Don_Cragun · May 5, 2016, 8:57pm

OK. So to be clear, you have a sample input file containing 1,587 lines each containing 1,602 bytes with DOS (<CR><NL>) line terminators and one incomplete line of 1,600 bytes with no line terminator. By definition, as you said, that is not a text file.

Can the modified output from your SAMPLE.txt input file be a real text file, or does the output have to have the same unterminated, trailing partial line?

Does the output file have to use DOS line terminators, or can we change them to UNIX (<NL>) line terminators?

Are you really unable to specify the format of the output you want to produce for the log file that records fields that were changed from containing ********** to containing 0000000000 ?

tomj5141 · May 6, 2016, 7:49am

The last line should be complete & include a line terminator. The actual file to be updated is ~ 14,087 lines depending on the number of customers. The output file can be the same format as the input. The output file just gets loaded into Oracle using SQL LOADER. For the output I do want the reference of position(01:20) from each line changed. Thanks for your assistance.

Don_Cragun · May 6, 2016, 2:47pm

Maybe the following will help you get what you need. It was written and tested using a Korn shell, but will work with any shell that understands basic POSIX required parameter expansions.

If you invoke this script with not operands, it will try to update a file named SAMPLE.txt . If you invoke it with an operand, it will use the 1st operand as the pathname of the file to be updated.

#!/bin/ksh
# Get name of file to be processed:
INFILE=${1:-SAMPLE.txt}
OUTFILE="$INFILE.new"

# Convert the input file to a DOS formatted text file by completing the last
# input line (in case the last input line is incomplete..
printf '\r\n' >> "$INFILE"

# Use awk to search for and, if found, replace strings of asterisks in two
# locations on each line to strings of zeroes, and log any changes made.
# Delete blank lines in case the last input line was complete and we added an
# extraneous, empty, DOS format line.
awk -v OUTFILE="$OUTFILE" '
BEGIN {	logfmt = "Fix applied: Line:%d, Offset:%d, Customer:%s\n"
	spot[++ns] = 1050
	spot[++ns] = 1249
}
/^[[:blank:]]*\r*$/ {
	# Skip blank lines (with DOS or UNIX line terminators).
	next
}
{	# Uncomment next line to change line terminators from DOS to UNIX.
	# sub(/\r$/, "")
	# Search the specified spots in each input line for 10 asterisks...
	for(i = 1; i <= ns; i++)
		if(substr($0, spot, 10) == "**********") {
			# and when found, change them to zeros...
			$0 = substr($0, 1, spot - 1) "0000000000" \
			    substr($0, spot + 10)
			# and log the cange made.
			printf(logfmt, NR, spot, substr($0, 1, 10))
		}
	# Copy the (possibly updated) input line to the output file.
	print > OUTFILE
}' "$INFILE" && cp "$OUTFILE" "$INFILE" && rm -f "$OUTFILE"
# If the conversion succeeded, the above line replaces the contents of the
# input file (to avoid breaking any links to the input file), and if the copy
# succeeds removes the temp file holding the updated input.

If someone else wants to try this on a Solaris/SunOS system, change awk in the script to /usr/xpg4/bin/awk ( nawk won't work for this script).

With the SAMPLE.txt file you provided as an input file, the awk output is:

Fix applied: Line:13, Offset:1249, Customer:2y1023300 
Fix applied: Line:33, Offset:1050, Customer:2a4323413 
Fix applied: Line:34, Offset:1050, Customer:2a4323413 
Fix applied: Line:41, Offset:1050, Customer:2a5133020 
Fix applied: Line:45, Offset:1050, Customer:2a5203011 
Fix applied: Line:46, Offset:1050, Customer:2a5203011 
Fix applied: Line:49, Offset:1050, Customer:2a5231320

and the spots indicated above in SAMPLE.txt are changed from ********** to 0000000000 , and a DOS <CR><NL> is added to the end of SAMPLE.txt to complete the incomplete line.

And, if you uncomment the line shown in red in the script, it will convert DOS format lines in the input file into UNIX format lines.

tomj5141 · May 10, 2016, 10:54am

I am not sure how to run that. I tried just copy & past into Putty but it does not seem to work.
[

#!/bin/ksh
# Get name of file to be processed:
INFILE=${1:-/home/download/east/master.dat}
OUTFILE="/home/download/east/$INFILE.new"
 
# Convert the input file to a DOS formatted text file by completing the last
# input line (in case the last input line is incomplete..
printf '\r\n' >> "$INFILE"
 
# Use awk to search for and, if found, replace strings of asterisks in two
# locations on each line to strings of zeroes, and log any changes made.
# Delete blank lines in case the last input line was complete and we added an
# extraneous, empty, DOS format line.
awk -v OUTFILE="$OUTFILE" '
BEGIN { logfmt = "Fix applied: Line:%d, Offset:%d, Customer:%s\n"
spot[++ns] = 1050
spot[++ns] = 1249
}
/ub(/\r$/, "")
# Search the specified spots in each input line for 10 asterisks...
for(i = 1; i <= ns; i++)
if(substr($0, spot, 10) == "**********") {
# and when found, change them to zeros...
$0 = substr($0, 1, spot - 1) "0000000000" \
substr($0, spot + 10)
# and log the cange made.
printf(logfmt, NR, spot, substr($0, 1, 10))
}
# Copy the (possibly updated) input line to the output file.
print > OUTFILE
}' "$INFILE" && cp "$OUTFILE" "$INFILE" && rm -f "$OUTFILE"^[[:blank:]]*\r*$/ {
# Skip blank lines (with DOS or UNIX line terminators).
next
}
{ # Uncomment next line to change line terminators from DOS to UNIX.
sub(/\r$/, "")
# If the conversion succeeded, the above line replaces the contents of the
# input file (to avoid breaking any links to the input file), and if the copy
# succeeds removes the temp file holding the updated input.]

Don_Cragun · May 10, 2016, 6:25pm

You run it just like you run any other shell script:

You copy the text of the script into a regular file with a name that you choose (let us use the name my_script for this example) using an editor that uses the <newline> character as a line terminator (not the DOS <carriage-return><newline> character pair line terminator).
You execute the command: chmod +x my_script to make your script executable.
And then you run your script.

How you run your script depends on what directory you are in, what directory my_script is in, and what file you want your script to process.

The part of the line in the script:

INFILE=${1:-/home/download/east/master.dat}

shown in red names the file that will be processed when you run this script if you do not specify an operand. If you want to process a file other than this default, you will specify the name of the file you want to process as the command-line argument when you invoke your script. If the file you want to process is in the directory you are sitting in when you run your script, the argument can just be the last component of the file's name; otherwise you will have to supply an argument that is an absolute or relative pathname (relative to the directory you are in when you run the script).

If you are in the directory where your script is located, you can invoke it with:

./my_script

to run it to process the default file, or with:

./my_script /path/of/file

to process a file with the given pathname.

If your script is located in a directory that is on the command search path specified by your PATH environment variable, you can invoke it with just:

my_code

or:

my_code pathname

And, if your script is not in the current directory and is not on your search path, you can invoke it using the absolute pathname of your script or a pathname of script relative to the directory in which you are sitting:

/path/to/my_script

or:

/path/to/my_script file

or:

/path/to/my_script /other/path/to/file

tomj5141 · May 11, 2016, 10:01am

That worked! I was able to call it from my main script & write to the log file for the main script. I was able to send it in an email also!

Thank you!

Don_Cragun · May 11, 2016, 5:50pm

I'm glad it worked for you.

Note that if you find a post particularly helpful in achieving your goal or helping you to understand how to get your shell or operating system to do what you want, you can hit the Thanks button at the lower left corner of that post to express your thanks to the person who submitted it.