Use of awk and printf - help needed

I have a very large file with more than 500,000 lines of dated events.

The first field contains the date/time in the following format:
20120727-files.files:20120727090044

where the first 8 numbers represent yyyymmdd. The last set of numbers represent yyyy/mm/dd/hh:mm:ss

I would like to change the value of the first field to:
YYYY-MM-DD HH:MM:SS

I have no clue how to do this. I started with displaying the first field only to simplify things.

awk -F"|" '{ print $1 }' infile 

I've tried to find a cheatsheet for printf but couldn't find anything good. Is printf universal to many languages? Or is printf in bash unique to all....?

Thanks for reading..

-David

The printf utility is not unique to bash; it is a standard utility present on any system that supports the common set of utilities defined by the POSIX standards and the Single UNIX Specification. But, if you're using awk to grab the 1st field out of your input file, you might as well just do all of the work in awk. If you just want to change the format of the first field in your file and print the updated line, you can use:

awk '
BEGIN { FS = OFS = "|"}
{       l=length($1)
        $1 = sprintf("%s-%s-%s %s:%s:%s", substr($1, l - 13, 4),
                substr($1, l - 9, 2), substr($1, l - 7, 2),
                substr($1, l - 5, 2), substr($1, l - 3, 2), substr($1, l - 1))
        print
}' infile 
exit

If you just want to print the dates and ignore the rest of the fields in in file, you can shorten this to:

awk -F '|' '{
        l=length($1)
        printf("%s-%s-%s %s:%s:%s", substr($1, l - 13, 4),
                substr($1, l - 9, 2), substr($1, l - 7, 2),
                substr($1, l - 5, 2), substr($1, l - 3, 2), substr($1, l - 1))
}' infile 
exit
1 Like

That's amazing.

It worked the first time around. I really appreciate your help on this.

If you could answer a qusetion... I'm having trouble finding a tutorial or information on the "l - 13" or "l - 9" portion of the code.

I read the man on awk for substr (which is vague and confusing) and I'm just curious how I can make myself fluent in this language without having to pull my hair out and ask a billion questions.... If you could give me some insight on that, I'd really appreciate it.

-David

Happy New Year!

---------- Post updated at 05:27 PM ---------- Previous update was at 05:12 PM ----------

I KIND OF understand what it's saying but not quite.

substr($1, l - 13, 4)

The above looks like it's starting the count, 4 from the end, 13 to the left, which goes over 2012.

substr($1, l - 9, 2)
The above starts the count 2 from the end, 9 to the left, which is right between the 2012 and 12, which seems perfect. Am I missing something here? (obviously I am.)

Hi David,
The awk command l=length($1) sets l to the number of characters in the first field. In your example, this sets l to 35 because there are 35 characters in "20120727-files.files:20120727090044" (not counting the terminating null byte). The substr(x, s, c) returns c characters from the string x starting with the sth character in the string. So, substr($1, l - 13, 4) returns 4 characters from 20120727-files.files:20120727090044 starting with the 22nd character (i.e., "2012"). When a count isn't given (as in substr($1,l - 1) ), substr() will return the remainder of the string from the given starting point. This is all done because I assume that the files.files portion of your input lines may vary in length, but we know that the timestamp is always the last 14 characters in the string.

1 Like
awk -F":" '{print substr($2,0,4)"-"substr($2,5,2)"-"substr($2,7,2)" "substr($2,9,2)":"substr($2,11,2)":"substr($2,13,2)}' your_file

Regards,
Vijay