How can I apply 'date' command to specific columns, in a BASH script?

richardsantink · June 14, 2018, 11:17am

Hi everyone,

I have a situation in which I have multiple (3 at last count) date columns in a CSV file (, delim), which need to be changed from:
January 1 2017 (note, no comma after day)
to:
YYYY-MM-DD

So far, I am able to convert a date using:
date --date="January 12, 1990" +%Y-%m-%d

However, I need to read a specific column in the CSV, then replace what's there with the new format. It doesn't matter which commonly used GNU utility I use, I just need to do it.

So far, I have been trying to write an AWK function in a BASH shell script to do it, but I'm completely lost.

Can anyone offer suggestions on the easiest way to accomplish this from within a BASH script? The attached sample.txt gives an idea of what the data looks like, only greatly reduced. The really data has about 3000 rows and 35 columns.

Thanks,

Rich

vgersh99 · June 14, 2018, 11:34am

what's your OS/awk version?
Do you have gawk?

richardsantink · June 14, 2018, 11:42am

GNU Awk 4.1.3, API: 1.1 (GNU MPFR 3.1.4, GNU MP 6.1.0)

Using Ubuntu on Windows 10 Enterprise in the Linux subsystem.
Thanks,

R

vgersh99 · June 14, 2018, 12:16pm

awk -f rich.awk myInput.csv where rich.awk is:

BEGIN {
 FS=OFS=","
 mon="January|February|March|April|May|June|July|August|September|October|November|December"
   monN=split(mon, monA, "|");
   for(i=1; i<=monN; i++) {
     monA[monA]=i;
     delete monA;
   }

   if (!col) col="3"

   colN=split(col, colA, " ")
}

function conv(str,  strA)
{
   split(str, strA, " ")
   return(sprintf("%d-%02d-%02d", strA[3], monA[strA[1]], strA[2]))
}

FNR>1 {
   for(i=1; i<= colN; i++)
     $colA=conv($colA)
}
1

By default this will convert column 3. You can specify multiple columns to be converted by:
awk -v col='3 5 whateverColNum' -f rich.awk myInput.csv

RudiC · June 14, 2018, 12:36pm

You wanted to use the date command ... try

grep -Eo "[A-Z][a-z]* [0-9]{1,2} [0-9]{4}" /tmp/sample.txt |
LC_ALL=C date -f- +"%B %-d %Y:%F" |
awk '
NR == FNR       {T[$1] = $2
                 next
                }
FNR > 1         {$3 = T[$3]
                 $5 = T[$5]
                }
1
' FS=: - FS=, OFS=, /tmp/sample.txt

richardsantink · June 14, 2018, 1:06pm

Thank you both! I will try both approaches.

R

Scrutinizer · June 14, 2018, 6:37pm

Another variety:

awk '
  BEGIN {
    FS=OFS=","
    split("January,February,March,April,May,June,July,August,September,October,November,December", MNUM)
    for(i in MNUM)
      MON[MNUM]=i                                                                         
  }                          
  {
    for(i=1; i<=NF; i++)
      if ($i ~ /^[[:alpha:]]+ [0-9]{1,2} [0-9]{4}$/) {
        split($i, D, " ")
        $i=sprintf("%d-%02d-%02d",D[3],MON[D[1]],D[2])
      }                                               
    print
  }      
' file

RudiC · June 15, 2018, 2:12am

If necessary, you can adapt above to your locale settings:

awk '
BEGIN   {"locale mon" | getline TMP
         for (i=split(TMP, MNUM, ";"); i; i--)  MON[MNUM]=i
        }                          
'