Here Date format should be in mm/dd/yyyy but file has date format in different format - m/dd/yyyy or mm/d/yyyy or m/d/yyyy. If the format is not correct then a leading zero should be added to make it in mm/dd/yyyy like 4/5/2016 to 04/05/2016.
I tried few awk printf commands but nothing works out for multiple occurrences within the line.
Please let me know your possible way to crack this scenario.
It looks to me as though when it's a header ($1 == 'H') you are looking at fields 5 and 6 ($5 and $6). When it's a detail line you are looking at fields 4 and 5.
I don't know awk but is seems to me you need a construct like:
if ($1 == 'H') {
# code for $5
# identical code but for $6
} elsif ($1 == 'D') {
# code for $5
# identical code but for $4
}
As for the modification for the date you need to split on '/' and rebuild using printf.
I don't really know awk so the above construct is probably nonsensical.
Concentrate on modifying field 5 and then try putting the above construct in for the other fields.
In your sample there seem to be no other data similar to dates, so the following abbreviation may (or may not, depending on how representative your sample is) work:
Replace every instance of a single digit followed by a slash by this sequence with a "0" prepended. This would amount to a simple sed -command:
sed 's/\([^0-9]\)\([0-9]\/\)/\10\2/g' /path/to/your/file
This will replace in i.e. "|4/5/2016" the "|4/" with "|04/" and then the "/5/" to "/05/".
Thank you for nice code. I think you could change (match ($field, ".*/.*/.*")) to (match($field,/[0-9]+\/[0-9]+\/[0-9]+/)) , which will get only digits(for matching dates only).
A suite of programs dealing with many aspects of dates is dateutils, found in many repositories ( ArchLinux, Debian, Fedora, FreeBSD, Gentoo, NetBSD, OpenSuSE, OS, Slackware, Ubuntu ).
Most of the codes scan the input lines for data that looks like a date format, then processes it. In this case we ask dconv to read it and write it with the same format:
$ ./s1
Environment: LC_ALL = C, LANG = C
(Versions displayed with local utility "version")
OS, ker|rel, machine: Linux, 3.16.0-4-amd64, x86_64
Distribution : Debian 8.4 (jessie)
bash GNU bash 4.3.30
dateutils.dconv dconv 0.3.1
/usr/bin/dateutils.dconv
-----
Input data file data1:
H|123456|Joes Watson|UK|4/5/2016|12/5/2016|3456|HC|NW||||||
D|123456|Joes Watson|4/5/2016|12/5/2016|LR|LW||||||||||||||||||
D|123456|Joes Watson|4/5/2016|6/11/2016|WR|LW||||||||||||||||||
H|123457|Jack Roes|UK|11/11/2015|12/11/2015|3456|HC|NW||||||
D|123457|Jack Roes|1/5/2016|1/5/2016|LR|LW||||||||||||||||||
D|123457|Jack Roes|4/5/2016|6/12/2016|WR|LW||||||||||||||||||
-----
Results:
H|123456|Joes Watson|UK|04/05/2016|12/05/2016|3456|HC|NW||||||
D|123456|Joes Watson|04/05/2016|12/05/2016|LR|LW||||||||||||||||||
D|123456|Joes Watson|04/05/2016|06/11/2016|WR|LW||||||||||||||||||
H|123457|Jack Roes|UK|11/11/2015|12/11/2015|3456|HC|NW||||||
D|123457|Jack Roes|01/05/2016|01/05/2016|LR|LW||||||||||||||||||
D|123457|Jack Roes|04/05/2016|06/12/2016|WR|LW||||||||||||||||||
If not in your repository or you cannot change your system, see the github source -- that is how I originally obtained the codes until dateutils appeared in my repository.
This is because the first match on the BRE is |1/ and you want to match /5/ to change the 5 to 05 , but the leading / for the 2nd match was consumed as the trailing / in the previous match.
This can be fixed by applying that substitute command twice:
sed -e 's/\([^0-9]\)\([0-9]\/\)/\10\2/g' -e 's/\([^0-9]\)\([0-9]\/\)/\10\2/g' /path/to/your/file
or with a slightly simpler second substitute command:
sed -e 'sx\([^0-9]\)\([0-9]/\)x\10\2xg' -e 'sx/\([0-9]/\)x/0\1xg' /path/to/your/file
(using x as the substitution delimiter instead of / to avoid a few backslashes).
Note that this sed solution will also work if a date field also includes time (e.g., 1/5/2016 11:23:45 ) while the awk solutions suggested will fail in this case. And note that this sed solution may "fix" too much if there is a field containing a simple fraction where the numerator is a single digit (e.g., 1/2 off will become 01/2 off and 2/3 majority will become 02/3 majority ). None of these are problems with the given sample input, but with all of those empty fields in the sample we have to wonder what else might appear in those field in other input lines???
Some sed versions do have the \< and \> boundary anchors; they do not consume a character. And not only allow a /g in one go but also make the substitution easier.
sed 's#\<[0-9]/#0&#g'
--
Just seeing that the Solaris /usr/xpg4/bin/sed counts the \< as a character that is truely a bug (/usr/bin/sed works).
Why do you think this is a bug in /usr/xpg4/bin/sed ? The standards say that the REs accepted by sed are Basic Regular Expressions (BREs) with three minor tweaks that don't affect this command. In a BRE the meaning of a backslash followed by a character that is not a BRE special character, a <left-parenthesis>, <right-parenthesis>, <left-curly-bracket>, <right-curly-bracket>, the current delimiter character (the character that delimits the BRE and replacement from the command and flags), and the decimal digits 1 through 9 is undefined (and all of these are valid after a <backslash> only in certain contexts). The BRE special characters are <period>, <left-square-bracket>, <backslash>, <asterisk>, <circumflex>, and <dollar-sign> and all of these are also only special in certain contexts.
According to the standards, using \< or \> in a BRE can be treated by the implementation as a request to match a < or > , respectively; match a word boundary; abort with a BRE syntax error; hunt down and kill the programmer who wrote that BRE, or anything else it chooses to do. (Legal and marketing implications make it unlikely that an implementation will make the next to the last choice in that list. )
The Solaris man page for both /usr/bin/sed and /usr/xpg4/bin/sed refers to the regexp man page, that states
...
3.1 \< constrains a RE to match the beginning of
a string or to follow a character that is
not a digit, underscore, or letter. The
first character matching the RE must be a
digit, underscore, or letter.
Even if it does not explicitly say that the \< should never count as a character, the implementations in /usr/bin/sed and /usr/xpg4/bin/sed should be consistent.
Hi MadeInGermany,
OK. Yes, the fact that /usr/xpg4/bin/sed does not behave as documented by the Solaris sed man page is definitely a bug. (It isn't a standards conformance issue, but it is a bug.)