Replacing last line with awk and change the file name

rohit_shinez · August 21, 2014, 10:28am

Hi Guys,

I am having a set of date format files files where I am performing the below set of operations in the files . I Need to replace the last line value with specific date which is a pipe delimited file.

for eg

f1_20140101.txt
 
aa|aus|20140101|yy
bb|nz|20140101|yy
.
.
3|20140101|zz
 
f2_20140101.txt
 
aa|aus|20140101|yy
bb|nz|20140101|yy
.
.
3|20140101|zz

I need to replace the file names f1_20140101.txt,f2_20140101.txt to f1_20140303.txt, f2_20140303.txt and to replace the last line like below

f1_20140303.txt
 
aa|aus|20140303|yy
bb|nz|20140303|yy
.
.
3|20140303|zz
 
f2_20140303.txt 
aa|aus|20140303|yy
bb|nz|20140303|yy
.
.
3|20140303|zz

the changed value columns are fixed postition

rbatte1 · August 21, 2014, 11:14am

Dear rohit_shinez,

I have a few to questions pose in response first:-

What have you tried so far?
What output/errors do you get?
What OS and version are you using?
What are your preferred tools? (C, shell, perl, awk, etc.)
What logical process have you considered? (to help steer us to follow what you are trying to achieve)

Most importantly, What have you tried so far? There are several threads on this sort of thing already, but it depends what OS & version you are using as to whether the features exploited are available.

There are probably many ways to achieve most tasks, so giving us an idea of your style and thoughts will help us guide you to an answer most suitable to you so you can adjust it to suit your needs in future.

We're all here to learn and getting the relevant information will help us all.

Robin

rohit_shinez · August 23, 2014, 10:35am

Hi,

I am using Sun Solaris OS and its a shell script
so far i have tried with below code

for i in f1*.txt
do
awk 'BEGIN {OFS=FS="|"}$3==20140101{$3=20140303}{print}' $i>$1temp
mv $itemp `basename "$i" temp`.20140303
done

But not able to replace the last line of my file and the file name is not getting replaced based on the below output i required

f1_20140303.txt
 
aa|aus|20140303|yy
bb|nz|20140303|yy
.
.
3|20140303|zz
 
f2_20140303.txt 
aa|aus|20140303|yy
bb|nz|20140303|yy
.
.
3|20140303|zz

Don_Cragun · August 23, 2014, 2:19pm

Your desired output is a little ambiguous. Are the filenames included as text on the first line in the files?

Is the number to be replaced in the files a constant, or should it be extracted from the name of the file being processed.

Your for loop is only looking for files starting with f1 , but it looks like you want to process all files starting with f and ending with .txt or maybe starting with f followed by one (or one or more) digits followed by an underscore and any string of 8 digits (or a particular string of 8 digits) followed by and ending with .txt . Please describe in English which set of files should be processed.

If more than one date's input files are to be processed as a set and the output files are supposed to have a single output date, are the numbers between the f and the _ in the output filenames supposed to be adjusted? If so, does the output order matter? Will there ever be more than 9 input or output files for a given date?

RudiC · August 23, 2014, 2:40pm

Apart from what Don Cragun says, two comments on your script:

as you want to rename the files anyhow, no temp file is needed (whatever $1temp may be, and the new file name will not be what you specified in post#1).
the last line is not modified as the date string is in field 2 there, not in field 3.

rohit_shinez · August 23, 2014, 3:04pm

Hi,

Let me be clear on the requirement

Files which ever starting from f and ending with .txt i.e. f*.txt. All the files naming standard will be in the format f1_YYYYMMDD.txt.
Need to replace the file naming standard to specific date for e.g. f1_20140101.txt to f1_20140303.txt
3.The step 2 condition will be done based on the replacing the dated column to specific date say here to 20130303. My input file i.e f1_20140101.txt will have data like this:

aa|aus|20140101|yy
bb|nz|20140101|yy
.
.
3|20140101|zz

the input file will follow the same pattern i need to replace the

3rd dated position column to 20140303 in rest of lines
2nd dated position column to 20140303 in last line of the input file

output some thing like this with file name replaced to f1_20140303.txt

aa|aus|20140303|yy
bb|nz|20140303|yy
.
.
3|20140303|zz

RudiC · August 23, 2014, 3:36pm

You might want to try this straightforward solution, which would satisfy your requirements with your sample data, but might fail with other file structures that you did not mention yet:

OD="20140101"; ND="20140303"
awk -v OD="$OD" -v ND="$ND" '{sub (OD, ND, FILENAME); sub (OD, ND); print > FILENAME}' f*.txt

ls -1 f*.txt
f1_20140101.txt
f1_20140303.txt
f2_20140101.txt
f2_20140303.txt

cat f1_20140303.txt 
aa|aus|20140303|yy
bb|nz|20140303|yy
.
.
3|20140303|zz

rohit_shinez · August 23, 2014, 3:47pm

Hi,

i guess your snippet would replace where 20140101 exist in the file irrespective of field position in the file. The position of replacing the field column is fixed for me i.e. 3rd position in rest of the line of the file and 2nd position in last line of my file.

Don_Cragun · August 23, 2014, 4:06pm

So the f2_20140101.txt in messages #1 and #3 in this thread were just there to confuse us???

RudiC's suggested code ignores your answer here and modifies all matching files shown in the samples in those earlier posts.

Your loop was looking for f1*.txt which implies that there could be a f1_20140101.txt, f1_2040205.txt, and a lot of other files that match the pattern. Will there only be one file that matches that pattern? Or, is your intent to remove all but one of the matching files, and then modify and move the last matching file to use the new date (which is what your script seems to do)?

RudiC's code assumes only one source date exists (even though it doesn't remove the old files after the conversion to the new date).

rohit_shinez:

3.The step 2 condition will be done based on the replacing the dated column to specific date say here to 20130303. My input file i.e f1_20140101.txt will have data like this:
aa|aus|20140101|yy
bb|nz|20140101|yy
.
.
3|20140101|zz
the input file will follow the same pattern i need to replace the

3rd dated position column to 20140303 in rest of lines

2nd dated position column to 20140303 in last line of the input file

output some thing like this with file name replaced to f1_20140303.txt
aa|aus|20140303|yy
bb|nz|20140303|yy
.
.
3|20140303|zz

Will the date to be changed ever appear in any field other than field 3 in any line other than on the last line in the file? Will the date to be changed ever appear in any field other than field 2 in the last line in the file? RudiC's code assumes the answer to both of these is no. (And that is a reasonable assumption given the sample data you've provided. If his assumption is incorrect, you can use different code to handle your data. But we can make simplifying assumptions if you confirm that the date to be changed can never appear in field 1 and only appears in field 2 on the last line of your input files.)

You have said this is to be a shell script. On SunOS systems there are huge differences between sh and ksh , and some minor differences between ksh and bash . Can we use bash or ksh (or /usr/xpg4/bin/sh )?

RudiC's code asks you to fill in the old date manually. If you do that and there is only one old date to process, the shell doesn't matter (as long as it isn't csh or a csh derivative. If you want the script to extract the old date from the name of the file being processed, /bin/sh requires a slower and (at least to some) more complicated method to extract the date from the name than newer shells.

rohit_shinez · August 23, 2014, 4:16pm

Hi Don,

Yes i can use bash and ksh.

The dated column on 3rd position of rest all line alone needs to be changed except the last line because in last line, the date appears in 2nd position which needs to be changed with new date.

The same date can appear in any other field position of the file but not be changed something like this

f1_20140303.txt

aa|aus|20140303|yy|20140101
bb|nz|20140303|yy|20140101
.
.
3|20140303|zz

Don_Cragun · August 23, 2014, 4:24pm

rohit_shinez:

Hi Don,

Yes i can use bash and ksh.

The dated column on 3rd position of rest all line alone needs to be changed except the last line because in last line, the date appears in 2nd position which needs to be changed with new date.

The same date can appear in any other field position of the file but not be changed something like this
f1_20140303.txt

aa|aus|20140303|yy|20140101
bb|nz|20140303|yy|20140101
.
.
3|20140303|zz

You didn't answer any of my other questions and you didn't say RudiC's code is failing. So I assume his code is working perfectly for you.

If not, please answer my earlier questions.

rohit_shinez · August 24, 2014, 3:20am

Hi Don,

Yes the code given by RudiC worked just wanted to know how the code checked the position to get replaced the exact position i.e. 3rd in rest and 2nd at last line irrespective of the dated column appeared in different location. I used the below file which worked perfectly

cat f1_20140101.txt
aa|aus|20140101|yy|20140101
bb|nz|20140101|yy|20140101
3|20140101|zz

Output achieved from the code which is what i required

cat f1_20140303.txt
aa|aus|20140303|yy|20140101
bb|nz|20140303|yy|20140101
3|20140303|zz

I believe that since sub will replace the first occurrence thats why its replacing but what it would be the case if the file is something like this

aa|20140101|20140101|yy|20140101
bb|nz|20140101|yy|20140101
3|20140101|zz

in the above case i need to replace only my 3rd position irrespective of any position i find the dated column.

Don_Cragun · August 24, 2014, 4:28am

rohit_shinez:

Hi Don,

Yes the code given by RudiC worked just wanted to know how the code checked the position to get replaced the exact position i.e. 3rd in rest and 2nd at last line irrespective of the dated column appeared in different location. I used the below file which worked perfectly
cat f1_20140101.txt
aa|aus|20140101|yy|20140101
bb|nz|20140101|yy|20140101
3|20140101|zz
Output achieved from the code which is what i required
cat f1_20140303.txt
aa|aus|20140303|yy|20140101
bb|nz|20140303|yy|20140101
3|20140303|zz
I believe that since sub will replace the first occurrence thats why its replacing but what it would be the case if the file is something like this

aa|20140101|20140101|yy|20140101
bb|nz|20140101|yy|20140101
3|20140101|zz

in the above case i need to replace only my 3rd position irrespective of any position i find the dated column.

RudiC's code does not care what field is changed, it just changes the 1st occurrence of $OD it finds on a line (if there are any) to $ND . As I said, your sample data implies that the 2nd field in every line except the last line will be a lowercase alphabetic country code; not an eight digit number that looks like a date in YYYYMMDD or YYYYDDMM format. If your actual input file(s) contain a data field matching $OD before the field you wanted to change (as in your latest example), his code won't do what you want.

If RudiC's code is not sufficient for what you want, you need to answer my questions so we have a complete understanding of what this code is supposed to do.

rohit_shinez · August 24, 2014, 4:38am

Hi Don,

Please find my answers for your questions:

1.Yes the date will be appeared in any field but i want the date to be replaced in 3rd position alone if it matches old date
2.No the date will not appear anywhere for the last line other than 2nd position. This pattern is only for last line

Don_Cragun · August 24, 2014, 4:43am

Please answer ALL of my questions. Trying to write and rewrite code for you from incomplete specifications is a waste of all of our time!

rohit_shinez · August 24, 2014, 4:56am

don cragun:

So the f2_20140101.txt in messages #1 and #3 in this thread were just there to confuse us???

RudiC's suggested code ignores your answer here and modifies all matching files shown in the samples in those earlier posts.

Your loop was looking for f1*.txt which implies that there could be a f1_20140101.txt, f1_2040205.txt, and a lot of other files that match the pattern. Will there only be one file that matches that pattern? Or, is your intent to remove all but one of the matching files, and then modify and move the last matching file to use the new date (which is what your script seems to do)?

RudiC's code assumes only one source date exists (even though it doesn't remove the old files after the conversion to the new date).

Will the date to be changed ever appear in any field other than field 3 in any line other than on the last line in the file? Will the date to be changed ever appear in any field other than field 2 in the last line in the file? RudiC's code assumes the answer to both of these is no. (And that is a reasonable assumption given the sample data you've provided. If his assumption is incorrect, you can use different code to handle your data. But we can make simplifying assumptions if you confirm that the date to be changed can never appear in field 1 and only appears in field 2 on the last line of your input files.)

You have said this is to be a shell script. On SunOS systems there are huge differences between sh and ksh , and some minor differences between ksh and bash . Can we use bash or ksh (or /usr/xpg4/bin/sh )?

RudiC's code asks you to fill in the old date manually. If you do that and there is only one old date to process, the shell doesn't matter (as long as it isn't csh or a csh derivative. If you want the script to extract the old date from the name of the file being processed, /bin/sh requires a slower and (at least to some) more complicated method to extract the date from the name than newer shells.

Hi Don,

Please find my answers sorry to bother you again and clear on my requirement:

Since I am using f1*.txt there will be only files with same dated format pattern like f1_20130101.txt, f2_20130101.txt, f3_20130101.txt
As per RudiC, it shouldn't move the older files which is what I also need just to redirect to another file with new dated format
Coming on to replacing the date in file:
[list=a]
Yes the date will be appeared in any field but I want the date to be replaced in 3rd position alone if it matches old date
No the date will not appear anywhere for the last line other than 2nd position. This pattern is only for last line
[/list]
The script can be written in ksh /bash
Old date can be made manually as per RudiC's code which will fit my requirement

Apart from this as I said earlier RudiC's code worked as per requirement only but problem is since sub is being used it will replace the first occurrence but as I mentioned in point a

RudiC · August 24, 2014, 5:36am

So - why don't you adapt the proposal to fit to your needs? You had an awk statement already in one of your early posts, so you should be able to join the two together to get at the perfect solution.

Don_Cragun · August 24, 2014, 7:05am

Looking back at the other 22 threads you have started and the suggestions that have been given to you, I agree with RudiC that you should have everything you need to solve this problem (although doing all of it in a single awk script is a little trickier than most of the requests in your other threads).

Please try adapting RudiC's suggestion to meet your needs and then let us see how close you can come to making it work.

And, in your last response you said "1.Since i am using f1*.txt , there will be only files with same dated format pattern like f1_20130101.txt,f2_20130101.txt,f3_20130101.txt". Which makes no sense to me. If you're using the pattern f1*.txt , that will match f1_20130101.txt, f1_20140101.txt and f1_20140303.txt (all of which have been listed as input or output files in this thread). It will never match any filename starting with f2_ or f3_ . RudiC's code used the pattern f*.txt which will match every number before the underscore and every date. I still don't understand which files you want to change! (I would have guessed you wanted to process all files that match the pattern f*_"$OD".txt , but that is not supported by any of your stated requirements nor by the code you supplied.)

rohit_shinez · August 24, 2014, 7:13am

Hi ,

Yes Don the file should process with f*_"$OD".txt

Tried with the below one but getting an error stating illegal statement

OD="20140101"; ND="20140303"
awk -v OD="$OD" -v ND="$ND" '{sub (OD, ND, FILENAME);$3==OD{$3=ND};print >FILENAME}' f*.txt

Don_Cragun · August 24, 2014, 7:45am

rohit_shinez:

Hi ,

Yes Don the file should process with f*_"$OD".txt

Tried with the below one but getting an error stating illegal statement
OD="20140101"; ND="20140303"
awk -v OD="$OD" -v ND="$ND" '{sub (OD, ND, FILENAME);$3==OD{$3=ND};print >FILENAME}' f*.txt

I am very disappointed in your attempt. You say your script should process f*_"$OD".txt , but your script processes f*.txt . You say your script should update field 3 in some lines and field 2 in the last line. Your script makes no attempt to distinguish between the last line and any other line, and never looks at nor update field 2.

Try something like this instead:

#!/bin/ksh
IAm=${0##*/}
if [ $# -ne 2 ]
then	printf 'Usage: %s old_date new_date\n' "$IAm" >&2
	printf 'where old_date and new_date are in the format YYYYMMDD\n' >&2
	exit 1
fi
awk -v OD="$1" -v ND="${2}" '
BEGIN {	FS = OFS = "|"
}
NR > 1 {if(FNR == 1) 
		print uf2 > newfile
	else	print uf3 > newfile
}
FNR == 1 {
	if(NR > 1)
		close(newfile)
	newfile = FILENAME
	sub(OD, ND, newfile)
}
{	if($2 == OD) {
		save = $0
		$2 = ND
		uf2 = $0
		$0 = save
	} else	uf2 = $0
	if($3 == OD)
		$3 = ND
	uf3 = $0
}
END {	print uf2 > newfile
}' f*_"$1.txt"

The awk script could be simplified considerably if your shell script would determine the number of lines in each input file before invoking awk , but this more complex script gets rid of the need to have ICODE]awk[/ICODE] read each file twice or to have the shell or wc read the file to determine the line number of the last line in each file.