Modify comma delimited file columns.

KrishnaVM · January 25, 2017, 11:17pm

Please help me to update a file which contains date values as below:-

From:-

"1912108",20161130,"2016-12-01-00.00.00.000000","2016-12-01-08.37.12.000000"
"1912108",20161201,"2016-12-02-00.00.00.000000","2016-12-02-08.28.22.000000"

To:-

"1912108",2016-11-30,"2016-12-01 00.00.00.000000","2016-12-01 08.37.12.000000"
"1912108",2016-12-01,"2016-12-02 00.00.00.000000","2016-12-02 08.28.22.000000"

Don_Cragun · January 25, 2017, 11:45pm

krishnavm:

Please help me to update a file which contains date values as below:-

From:-

"1912108",20161130,"2016-12-01-00.00.00.000000","2016-12-01-08.37.12.000000"
"1912108",20161201,"2016-12-02-00.00.00.000000","2016-12-02-08.28.22.000000"

To:-

"1912108",2016-11-30,"2016-12-01 00.00.00.000000","2016-12-01 08.37.12.000000"
"1912108",2016-12-01,"2016-12-02 00.00.00.000000","2016-12-02 08.28.22.000000"

Is this a homework assignment? Homework and coursework questions can only be posted in the Homework & Coursework forum under special homework rules.

If this is not homework, please explain the company you work for and the nature of the problem you are working on. And please show us (in CODE tags) what you have tried to do to solve this problem on your own and tell us what operating system and shell you're using.

If you did post homework in the main forums, please review the guidelines for posting homework and repost.

KrishnaVM · January 26, 2017, 8:21am

Hi Admin,

This isn't a home work.
I am dealing with a flat file on RedHat Linux Server, which has millions of records (rows) and about 65 columns. Out of which only 3 columns needs this correction as below:-

From:-

"1912108",20161130,"2016-12-01-00.00.00.000000","2016-12-01-08.37.12.000000"
"1912108",20161201,"2016-12-02-00.00.00.000000","2016-12-02-08.28.22.000000"

To:-

"1912108",2016-11-30,"2016-12-01 00.00.00.000000","2016-12-01 08.37.12.000000"
"1912108",2016-12-01,"2016-12-02 00.00.00.000000","2016-12-02 08.28.22.000000"

I tried to format this as below, which works fine with the files that has few columns:-

awk -F "," '{print $1"|"substr($2,1,4)"-"substr($2,5,2)"-"substr($2,7,2)"|"substr($3,1,11)" "substr($3,13,26)"|"substr($4,1,11)" "substr($4,13,26)}' filename

Seeking help from this forum for a better solution, to deal with the file that has many columns in which case it is not a good idea to include all the columns under AWK print statement.

Thanks.

Scrutinizer · January 26, 2017, 8:52am

Hi, try using a regular expression instead..:

sed 's/\(-[^-]*-[^-]*\)-/\1 /g' file

or:

perl -pe 's/(-.*?-.*?)-/$1 /g' file

or since it is a date and it is always two character between the dashes this may suffice:

sed 's/\(-..-..\)-/\1 /g' file

or:

perl -pe 's/(-..-..)-/$1 /g' file

or:

perl -pe 's/(?<=-..-..)-/ /g' file

RudiC · January 26, 2017, 9:16am

To allow for the correction of field 2, try

sed  's/\(-[^-]*-[^-]*\)-/\1 /g; s/^\([^,]*,[0-9]\{4\}\)\([0-9]\{2\}\)\([0-9]\{2\}\)/\1-\2-\3/' file
"1912108",2016-11-30,"2016-12-01 00.00.00.000000","2016-12-01 08.37.12.000000"
"1912108",2016-12-01,"2016-12-02 00.00.00.000000","2016-12-02 08.28.22.000000"

KrishnaVM · January 26, 2017, 10:59am

It worked. Thanks a lot.

$sed  's/\(-[^-]*-[^-]*\)-/\1 /g; s/^\([^,]*,[0-9]\{4\}\)\([0-9]\{2\}\)\([0-9]\{2\}\)/\1-\2-\3/' test1
"1912108",2016-11-30,"2016-12-01 00.00.00.000000","2016-12-01 08.37.12.000000"
"1912108",2016-12-01,"2016-12-02 00.00.00.000000","2016-12-02 08.28.22.000000"

---------- Post updated at 10:59 AM ---------- Previous update was at 09:22 AM ----------

This worked well with timestamp columns (3 and 4).
In my flat file, even though I have them numbered as 64 and 65 th columns.

I am trying to tweak this logic to make it work (without success) - with other flat file that has the date column (field 2) as a field 3:-

"1080909","ABC",20161031,"2016-09-02-00.00.00.000000","2016-09-02-08.30.05.000000"
"1080091","XYZ",20160830,"2016-11-02-00.00.00.000000","2016-11-02-08.27.48.000000"

I am very new to AWK and SED. Can you please help.

RudiC · January 26, 2017, 11:14am

That's becoming nasty, see:

sed  -r 's/^(([^,]*,){2})([0-9]{4})([0-9]{2})([0-9]{2})/\1\3-\4-\5/; s/(-..-..)-/\1 /g' file
"1080909","ABC",2016-10-31,"2016-09-02 00.00.00.000000","2016-09-02 08.30.05.000000"
"1080091","XYZ",2016-08-30,"2016-11-02 00.00.00.000000","2016-11-02 08.27.48.000000

KrishnaVM · January 26, 2017, 11:21am

That works.
Thanks a Ton.

--
Apologies for missing to use code tags.