Format a date on additional column awk

benchin · March 5, 2019, 11:44am

Hi,
My objective is to achieve from HB.txt

05/20/1997,1130,5.93,5.96,5.93,5.96,49200
05/20/1997,1131,5.96,5.96,5.9,5.93,252400
05/14/1997,1132,5.93,5.99,5.93,5.99,89600
05/15/1997,1133,5.93,5.93,5.71,5.74,203200

into adding a day column by using the first column - date.

05/14/1997,1132,5.93,5.99,5.93,5.99,89600,Wed
05/15/1997,1133,5.93,5.93,5.71,5.74,203200,Thu
05/20/1997,1130,5.93,5.96,5.93,5.96,49200,Tue
05/20/1997,1131,5.96,5.96,5.9,5.93,252400,Tue

So far, I'm in progress to try this,

awk  'BEGIN{RS="\r"; FS=",";}{printf $0 "," $1;}' HB.txt > newHB.csv

However, the generated output for the new column is after a breakline.

05/14/1997,1132,5.93,5.99,5.93,5.99,89600,05/14/1997
05/15/1997,1133,5.93,5.93,5.71,5.74,203200
05/20/1997,1130,5.93,5.96,5.93,5.96,49200,
05/15/1997
05/20/1997,1131,5.96,5.96,5.9,5.93,252400,
05/20/1997

System Specifications:
macOS High Sierra
Terminal - Bash

vgersh99 · March 5, 2019, 12:00pm

Not the "day of the week" column, but...

awk -F, '{print $0, $1}' OFS= myFile

As a hint for the "day of the week":

$ date -d 05/14/1997
Wed, May 14, 1997 12:00:00 AM

RudiC · March 5, 2019, 12:07pm

Welcome to the forum.

Please post your OS, shell, and preferred tools' versions in the future so people in here don't need to guess what system features you have available and which you don't.

Your RS character \r (<CR> = 0x0D = ^M) seems a bit strange a choice as \n (<LF> = 0x0A) is the *nix line terminator. How was that input file produced? And, no attempt was made to get to the desired day-of-week result.

GNU date around, and a shell with "process substitution"? Try

paste -d, file <(date -f<(awk -F"," '{print $1}' file) +%a)
05/20/1997,1130,5.93,5.96,5.93,5.96,49200,Di
05/20/1997,1131,5.96,5.96,5.9,5.93,252400,Di
05/14/1997,1132,5.93,5.99,5.93,5.99,89600,Mi
05/15/1997,1133,5.93,5.93,5.71,5.74,203200,Do

Peasant · March 5, 2019, 12:22pm

If your awk supports, try if this works for you :

awk -F"," ' { split($1,a,"/") ; t=mktime(a[3] " " a[1] " " a[2] " 00 00 00") ; print $0,strftime("%a",t) } ' HB.txt

Hope that helps
Regards
Peasant.

benchin · March 5, 2019, 8:49pm

Updated*
System Specifications:
macOS High Sierra
Terminal - Bash

Thanks for the help/clues so far, I'll proceed to try it again

Don_Cragun · March 5, 2019, 9:56pm

The date utility, the awk utility, and the bash utility on macOS don't behave as the current GNU utility extensions that are referenced elsewhere in this thread, but if you're willing to use ksh instead of bash , you could try:

#!/bin/ksh
while IFS=, read -r date rest
do	printf '%s,%s,%(%a)T\n' "$date" "$rest" "$date"
done < HB.txt

which, with your sample input file, produces the output:

05/20/1997,1130,5.93,5.96,5.93,5.96,49200,Tue
05/20/1997,1131,5.96,5.96,5.9,5.93,252400,Tue
05/14/1997,1132,5.93,5.99,5.93,5.99,89600,Wed
05/15/1997,1133,5.93,5.93,5.71,5.74,203200,Thu

which seems to be what you want except that this will produce 3 character abbreviations for all days of the week instead of producing " Tues " for Tuesdays as you seem to want. Is this close enough?

This was tested on macOS Mojave version 10.14.3 instead of on macOS High Sierra, but I don't think ksh has changed significantly between these two releases.

benchin · March 6, 2019, 1:37am

Yes this is close enough. Thanks.

However, I am getting this instead (While using KSH).

,Wed4/1997,1132,5.93,5.99,5.93,5.99,89600
05/15/1997,1133,5.93,5.93,5.71,5.74,203200,Thu
,Tue0/1997,1130,5.93,5.96,5.93,5.96,49200

Could it be my system is not optimized for this?
I having issues with the break lines.

RudiC · March 6, 2019, 3:00am

That's an indicator that line are terminated with an (additional?) non-*nix standard <CR> (^M = 0x0D = \r) character. I learned in here that this be the default on macOS?
What baffles me is the inconsistency between lines; two with vs one without it.
Try

tr -d '\r' <file

not sure if and how this helps on macOS ...

Don_Cragun · March 6, 2019, 3:18am

Expanding a little bit on what RudiC said ....

It looks like the file you are processing is partially in DOS text file format with some <carriage-return> <newline> character pair line separators instead of UNIX text file format <newline> single character line terminators. But, as RudiC said, we would expect that to happen on every line; not just the first and last.

Please confirm by running the command:

od -bc filename

where filename is the name of the input file you processed to get the output you showed us in post #7 in this thread.

To get rid of the <carriage-return>s in a file you can use:

tr -d '\r' < old_file > new_file

where old_file is the name of a file containing <carriage-return>s and new_file is the name of thee file that you want to contain the contents of old_file without the <carriage-return>s. If this is a common problem with input files you'll be processing, you can change the script I suggested before to be:

#!/bin/ksh
tr -d '\r' < HB.txt | while IFS=, read -r date rest
do	printf '%s,%s,%(%a)T\n' "$date" "$rest" "$date"
done

or:

#!/bin/ksh
while IFS=, read -r date rest
do	printf '%s,%s,%(%a)T\n' "$date" "$rest" "$date"
done < HB.txt | tr -d '\r'

benchin · March 6, 2019, 4:50am

Thank you RudiC & Don! I have learned a lot today. It works