date conversion

charandevu · March 29, 2008, 2:59am

file1
E106,0,1/9/1993,0,E001,E003,A,45200,3766.667,21.730769
E108,0,2/3/1995,0,E001,E003,A,15000,1250,7.211538
E109,0,06-mar-07,0,E001,E001,A,78000,6500,37.5
E110,0,09-dec-2008,0,E001,E001,A,56000,4666.667,26.923077
E104,0,06/04/1994,0,E001,E003,A,95000,7916.667,45.673077
E105,0,7/30/1993,0,E001,E003,A,87000,7250,41.826923
E106,0,1/9/1993,0,E001,E003,A,45200,3766.667,21.730769
E108,0,01-feb-2008,0,E001,E003,A,15000,1250,7.211538
E109,0,2/15/1995,0,E001,E001,A,78000,6500,37.5

I want to convert the date format to yyyymmdd for a every line in a file.

Please let me know on this....

Thanks in advance
cherry

mirusnet · March 29, 2008, 2:58pm

cat file | sed -E 's/([0-9][0-9])-([[:alpha:]]+)-([0-9]{4})/\3\2\1/'

And case for month.

charandevu · March 30, 2008, 7:41am

Thanks for code ....

But, iam getting the following error ....

-----------------------------------------------------------
sed: illegal option 'E'
usage: sed script [-anW] [file ...]
sed [-anW] [-e script] ... [-f script_file] ...[file ...]
--------------------------------------------------------------

can you explain the code ? Please help me .......

redhead · March 30, 2008, 8:04am

cat file | sed -e 's/([0-9][0-9])-([[:alpha:]]+)-([0-9]{4})/$3$2$1/'

era · March 30, 2008, 10:53am

Take out the Useless Use of Cat while you are at it.

I don't think basic sed knows about $1 $2 $3, try with \3\2\1 instead.

sed -e 's/([0-9][0-9])-([[:alpha:]]+)-([0-9]{4})/\3\2\1/' file

But then, if you have a really pedestrian sed, it probably doesn't grok [[:alpha:]] or {4} either.

sed -e 's/([0-9][0-9])-([A-Za-z]+)-([0-9][0-9][0-9][0-9])/\3\2\1/' file

fpmurphy · March 30, 2008, 1:39pm

If you have ksh93 version h or better you can use the printf %T feature to easily do the date format conversion i.e.

#!/usr/bin/ksh93

IFS=','
while read v1 v2 v3 v4
do
    printf "%s,%s,%(%Y%m%d)T,%s\n" $v1 $v2 $v3 "$v4"
done < file

charandevu · March 30, 2008, 11:03pm

I checked the code but it is not working
cat > file6
E106,0,1/9/1993,0,E001,E003,A,45200,3766.667,21.730769
$ cat file6 | sed -e 's/([0-9][0-9])-([[:alpha:]]+)-([0-9]{4})/$3$2$1/'
E106,0,1/9/1993,0,E001,E003,A,45200,3766.667,21.730769

it is giving the same formate.......And i need to validate every line
in a file

yyyymmdd formate.........
E102 0 1/23/1994 0
E104 0 6/4/1994 0
E105 0 7/30/1993 0
E106 0 1/9/1993 0
E108 0 2/3/1995 0
E109 0 2/15/1995 0
E110 0 10/12/1995 0

result
E102 0 19940123 0
E104 0 19940604 0
E105 0 19930730 0
E106 0 19930109 0
E108 0 19950203 0
E109 0 19950215 0
E110 0 19951012 0

charandevu · March 30, 2008, 11:06pm

era:

Take out the Useless Use of Cat while you are at it.

I don't think basic sed knows about $1 $2 $3, try with \3\2\1 instead.
sed -e 's/([0-9][0-9])-([[:alpha:]]+)-([0-9]{4})/\3\2\1/' file
But then, if you have a really pedestrian sed, it probably doesn't grok [[:alpha:]] or {4} either.
sed -e 's/([0-9][0-9])-([A-Za-z]+)-([0-9][0-9][0-9][0-9])/\3\2\1/' file

I getting this error

$ sed -e 's/([0-9][0-9])-([[:alpha:]]+)-([0-9]{4})/\3\2\1/' file6
sed: -e expression #1, char 48: invalid reference \3 on `s' command's RHS

$ sed -e 's/([0-9][0-9])-([A-Za-z]+)-([0-9][0-9][0-9][0-9])/\3\2\1/' file6
sed: -e expression #1, char 57: invalid reference \3 on `s' command's RHS

Please let me know about this code in detail

Thanks for your response

charandevu · March 30, 2008, 11:14pm

fpmurphy:

If you have ksh93 version h or better you can use the printf %T feature to easily do the date format conversion i.e.
#!/usr/bin/ksh93

IFS=','
while read v1 v2 v3 v4
do
   printf "%s,%s,%(%Y%m%d)T,%s\n" $v1 $v2 $v3 "$v4"
done < file

Hi i am getting this error

file7: line 4: printf: `(': invalid format character

Please let me know on this asap........

Thanks for your response
charan

era · March 30, 2008, 11:46pm

Stupid of me to miss that. Most sed implementations want backslashes on the parentheses.

sed -e 's/\([0-9][0-9]\)-\([A-Za-z]+\)-\([0-9][0-9][0-9][0-9]\)/\3\2\1/' file

And if they're really stupid they don't even understand the +

sed -e 's/\([0-9][0-9]\)-\([A-Za-z][A-Za-z]*\)-\([0-9][0-9][0-9][0-9]\)/\3\2\1/' file

But that just rearranges the order of the fields, it doesn't magically convert month names to numbers. There was another thread on that recently.

The printf format error would seem to indicate you do not have ksh93 version h or better, is that correct? Read what fpmurphy wrote.

The general problem of a myriad different date formats is still not really tackled. I don't know if it could be -- how do you know if 03/12 is December 3rd or March 12th? (In computer-generated dates, dashes are usually an indication of European, aka sane order, whereas slashes tend to be used in American dates; but it depends on the application which generated these. Or are they human-generated? And of course, if they are written by Americans, you can make certain assumptions.)

fpmurphy · March 31, 2008, 2:23pm

Then you are not using ksh93 verision h or later! echo ${.sh.version} to see ksh93 version string.

era · March 31, 2008, 3:01pm

Just to top off this thread, here's a feeble attempt at converting month names to numbers, in sed.

# hack: start at 100 and drop the first digit to easily get leading zeros where required
c=100
for m in '[Jj]an' '[Ff]eb' '[Mm]ar' '[Aa]pr' '[Mm]ay' '[Jj]un' \
         '[Jj]ul' '[Aa]ug' '[Ss]ep' '[Oo]ct' '[Nn]ov' '[Dd]ec'
do
  c=`expr $c + 1`
  cat <<__HERE
    s/\\([0-9][0-9]\\)-$m-\([0-9][0-9][0-9][0-9]\)/\2${c#1}\1/
__HERE
done | sed -f - file.txt

charandevu · April 1, 2008, 6:53am

era:

Just to top off this thread, here's a feeble attempt at converting month names to numbers, in sed.

# hack: start at 100 and drop the first digit to easily get leading zeros where required
c=100
for m in '[Jj]an' '[Ff]eb' '[Mm]ar' '[Aa]pr' '[Mm]ay' '[Jj]un' \
   '[Jj]ul' '[Aa]ug' '[Ss]ep' '[Oo]ct' '[Nn]ov' '[Dd]ec'
do
  c=`expr $c + 1`
  cat <<__HERE
   s/\\([0-9][0-9]\\)-$m-\([0-9][0-9][0-9][0-9]\)/\2${c#1}\1/
__HERE
done | sed -f - file.txt

it is not working.......
file1
E108,0,2/3/1995,0,E001,E003,A,15000,1250,7.211538
E109,0,2/15/1995,0,E001,E001,A,78000,6500,37.5
E110,0,10/12/1995,0,E001,E001,A,56000,4666.667,26.923077

I wann to convert the this file1 to file2

E108,0,199523,0,E001,E003,A,15000,1250,7.211538
E109,0,1995215,0,E001,E001,A,78000,6500,37.5
E110,0,19951012,0,E001,E001,A,56000,4666.667,26.923077

and on this file i am going to do more process like sort and compare.....with other files which has this date format...........

And is it possible to hold the date value in a string variable while reading from a file..........i am here with developing a tool which should process within a min......And i am new to unix.

Please let me know on this how to do that..........asap

era · April 3, 2008, 3:13am

The code I posted only handles the conversion of abbreviated month names to numbers.

I don't understand "hold the date value in a string variable while reading from a file", I mean yes you can do that, but do you mean read it for each line, or what? Backticks would be a typical solution to this sort of problem, or if you are reading the file line by line in a while loop, then by all means continue to do that.

I would write a small awk or perl script with heuristics for each possible date format. Only you know how wild the variation is; I have seen three different formats in the samples you have posted, so tackling only those should be doable.

perl -naF, -e 'BEGIN { my @m = qw(dummy jan feb mar apr
    may jun jul aug sep nov dec);
  %m = map { $m[$_] => $_ } 1..12;
  $m = join ("|", keys %m);
}
if ($F[2] =~ m%(\d{1,2})/(\d{1,2})/(\d{4})%) {
  $F[2] = sprintf "%04i%02i%02i", $3, $1, $2;
} elsif ($F[2] =~ m%(\d{1,2})-($m)-(\d{4})%i) {
  $F[2] = sprintf "%04i%02i%02i", $3, $m{$2}, $1;
} elsif ($F[2] =~ m%(\d{1,2})-($m)-(\d{2})%i) {
  $F[2] = sprintf "%04i%02i%02i", 2000+$3, $m{$2}, $1;
}
print join (",", @F)'

This is untested

danmero · April 3, 2008, 3:45am

You have another thread for this subject

and vgersh99 already provide the solution: