awk script to extract a column, replace one of the header and replace year(from ddmmyy to yyyy)

Kunalcurious · May 10, 2019, 7:21am

I have a csv which has lot of columns . I was looking for an awk script which would extract a column twice. for the first occurance the header and data needs to be intact but for the second occurance i want to replace the header name since it a duplicate and extract year value which is in ddmmyy and replace it with yyyy.

my csv looks like

"abc","sdf","sdf","we","qe","fr","ty","hgf","er","jy","PERIOD_END_DATE"
"123","","234","ety","","ghj","tyu","Inh","dn","ngnh","17-FEB-19"

what i want to do is use an awk script to print the 11th column twice and extract the year from the data and print it in yyyy format as below:

"PERIOD_END_DATE" "YEAR"
"17-FEB-19" "2019"

Currently i can extract the following from my csv:

"PERIOD_END_DATE" "PERIOD_END_DATE"
"17-FEB-19" "17-FEB-19"

using the awk script as follows:

awk -F ',' '{print $11,$11"\r"}' test1.csv > test2.csv

But what i need is the second column header to be replaced to "Year" and extract the year from the columns and print it in yyyy format for the whole column, so it will look like :

 "PERIOD_END_DATE" "YEAR"
"17-FEB-19" "2019"

Can anyone help me with this, Please ask for more clarity if this is unclear.

anbu23 · May 10, 2019, 7:30am

$ awk -F, -v yr='"YEAR"' ' { $0=$0 OFS (NR == 1 ? yr : "20" substr($0,length($0)-2,3)) } 1 ' OFS=","  file
"abc","sdf","sdf","we","qe","fr","ty","hgf","er","jy","END_DATE","YEAR"
"123","","234","ety","","ghj","tyu","Inh","dn","ngnh","17-FEB-19",2019"

Kunalcurious · May 10, 2019, 7:33am

will this year replacement will replace any year, say the data might have 2018, 2019, 2015, 2016, so all it gotta do is extract 18, 19 and add 20 in front so the value becomes 2014,2015,2016,2017,2018, 2019. Irrespective of the year. is this possible? Also, the file is a csv so I want to pass the csv file to the awk script since the file is huge, as can be seen i had input csv file to the script
test1.csv > test2.csv

anbu23 · May 10, 2019, 7:48am

Yes this will replace any year(If year is less than 2000 then prefix 19) and you can pass csv file to awk

Kunalcurious · May 10, 2019, 7:55am

@anbu23 for some reason it acts funny. the command I used is:

awk -F, -v yr='"YEAR"' ' { $0=$0 OFS (NR == 1 ? yr : "20" substr($0,length($0)-2,3)) } 1 ' OFS="," testk3.csv > test5.csv

and the output is:

,"YEAR"_END_DATE" "PERIOD_END_DATE"
,209"EB-19" "17-FEB-19"
,209"EB-19" "24-FEB-19"
 ,209"EB-19" "24-FEB-19"

Note: I made 1 mistake on my part while asking the question and before making an edit i mentioned "END_DATE" instead of "PERIOD_END_DATE". I apologize for the inconvenience.

anbu23 · May 10, 2019, 9:15am

can you post sample data from testk3.csv?

Kunalcurious · May 10, 2019, 9:31am

These are some lines from the file:

"PERIOD_END_DATE" "PERIOD_END_DATE"
"17-FEB-19" "17-FEB-19"
"24-FEB-19" "24-FEB-19"
"24-FEB-19" "24-FEB-19"

anbu23 · May 10, 2019, 9:34am

$ cat testk3.csv
"PERIOD_END_DATE" "PERIOD_END_DATE"
"17-FEB-19" "17-FEB-19"
"24-FEB-19" "24-FEB-19"
"24-FEB-19" "24-FEB-19"
$ awk -F, -v yr='"YEAR"' ' { $0=$0 OFS (NR == 1 ? yr : "20" substr($0,length($0)-2,3)) } 1 ' OFS="," testk3.csv
"PERIOD_END_DATE" "PERIOD_END_DATE","YEAR"
"17-FEB-19" "17-FEB-19",2019"
"24-FEB-19" "24-FEB-19",2019"
"24-FEB-19" "24-FEB-19",2019"

Kunalcurious · May 10, 2019, 9:40am

Strange, when i run the following command:

awk -F, -v yr='"YEAR"' ' { $0=$0 OFS (NR == 1 ? yr : "20" substr($0,length($0)-2,3)) } 1 ' OFS="," testk3.csv > testk5.csv

it shows me output as before:

$ head testk3.csv
"PERIOD_END_DATE" "PERIOD_END_DATE"
"17-FEB-19" "17-FEB-19"
"24-FEB-19" "24-FEB-19"
"24-FEB-19" "24-FEB-19"
"17-FEB-19" "17-FEB-19"
"03-MAR-19" "03-MAR-19"
"10-MAR-19" "10-MAR-19"
"17-FEB-19" "17-FEB-19"
"03-MAR-19" "03-MAR-19"
"24-FEB-19" "24-FEB-19"



$ head testk5.csv
,"YEAR"_END_DATE" "PERIOD_END_DATE"
,209"EB-19" "17-FEB-19"
,209"EB-19" "24-FEB-19"
,209"EB-19" "24-FEB-19"
,209"EB-19" "17-FEB-19"
,209"AR-19" "03-MAR-19"
,209"AR-19" "10-MAR-19"
,209"EB-19" "17-FEB-19"
,209"AR-19" "03-MAR-19"
,209"EB-19" "24-FEB-19"

RudiC · May 10, 2019, 4:57pm

You seem to have DOS line terminators (<CR> = ^M = 0x0D = \r) in your file - how /where did you produce it? Remove them like

$ awk -F, -v yr='"YEAR"' ' { sub (/\r$/, ""); $0=$0 OFS (NR == 1 ? yr : "20" substr($0,length($0)-2,2)) } 1 ' OFS="," 
"17-FEB-19" "17-FEB-19",2019
"24-FEB-19" "24-FEB-19",2019
"24-FEB-19" "24-FEB-19",2019

Don_Cragun · May 10, 2019, 6:03pm

If we go back to post #1 in this thread, you might notice that Kunalcurious used the awk script:

awk -F ',' '{print $11,$11"\r"}' test1.csv > test2.csv

so there no reason to wonder where the <carriage-return>s came from...