Save output of updated csv file as csv file itself

Hi, all

I want to sort a csv file based on timestamp from oldest to newest and save the output as csv file itself. Here is an example of my csv file.

test.csv

SourceFile,DateTimeOriginal
/home/intannf/foto/IMG_0739.JPG,2015:02:17 11:32:21
/home/intannf/foto/IMG_0749.JPG,2015:02:17 11:37:28
/home/intannf/foto/IMG_0759.JPG,2015:02:17 11:49:09
/home/intannf/foto/IMG_0771.JPG,2015:02:17 11:51:22
/home/intannf/foto/IMG_0745.JPG,2015:02:17 11:36:01
/home/intannf/foto/IMG_0763.JPG,2015:02:17 11:49:45
/home/intannf/foto/IMG_0666 (1).JPG,2015:02:17 09:59:48
/home/intannf/foto/IMG_0782.JPG,2015:02:17 11:54:16
/home/intannf/foto/IMG_0750.JPG,2015:02:17 11:37:29
/home/intannf/foto/IMG_0762.JPG,2015:02:17 11:49:24
/home/intannf/foto/IMG_0752.JPG,2015:02:17 11:37:40
/home/intannf/foto/IMG_0684.JPG,2015:02:17 10:54:51
/home/intannf/foto/IMG_0687.JPG,2015:02:17 10:56:43
/home/intannf/foto/IMG_0777.JPG,2015:02:17 11:53:43
/home/intannf/foto/IMG_0756.JPG,2015:02:17 11:45:21
/home/intannf/foto/IMG_0746.JPG,2015:02:17 11:37:18
/home/intannf/foto/IMG_0743.JPG,2015:02:17 11:34:26

Output required

SourceFile,DateTimeOriginal
/home/intannf/foto/IMG_0666 (1).JPG,2015:02:17 09:59:48
/home/intannf/foto/IMG_0684.JPG,2015:02:17 10:54:51
/home/intannf/foto/IMG_0687.JPG,2015:02:17 10:56:43
/home/intannf/foto/IMG_0739.JPG,2015:02:17 11:32:21
/home/intannf/foto/IMG_0743.JPG,2015:02:17 11:34:26
/home/intannf/foto/IMG_0745.JPG,2015:02:17 11:36:01
/home/intannf/foto/IMG_0746.JPG,2015:02:17 11:37:18
/home/intannf/foto/IMG_0749.JPG,2015:02:17 11:37:28
/home/intannf/foto/IMG_0750.JPG,2015:02:17 11:37:29
/home/intannf/foto/IMG_0752.JPG,2015:02:17 11:37:40
/home/intannf/foto/IMG_0756.JPG,2015:02:17 11:45:21
/home/intannf/foto/IMG_0759.JPG,2015:02:17 11:49:09
/home/intannf/foto/IMG_0762.JPG,2015:02:17 11:49:24
/home/intannf/foto/IMG_0763.JPG,2015:02:17 11:49:45
/home/intannf/foto/IMG_0771.JPG,2015:02:17 11:51:22
/home/intannf/foto/IMG_0777.JPG,2015:02:17 11:53:43
/home/intannf/foto/IMG_0782.JPG,2015:02:17 11:54:16

and this is my script:

sort --field-separator=',' --key=2 -n test.csv > test.csv

Unfortunately, when i checked them out, there is no data in the test.csv. It is blank. Or it should be saved as another csv file? But i need to save it as csv file itself.

Please help me how to solve it. Thanks in advance.

Regards,
Intan

If you read from a file and write to the same file in one operation, you'll end up corrupting the file. Work-around is to write to a temp file and then rename the temp file.

sort .... test.csv > temp
mv temp test.csv
1 Like

The sort utility actually has an option to let you put the output in one of the input files... Try:

sort --field-separator=',' --key=2 -n -o test.csv test.csv

Note that since : is not a numeric character, using the -n option to sort means you are only sorting on the year in the 2nd field with your secondary sort key being the filename. If you want the month, day, hour, minute, and second portions of the 2nd field on these lines to participate in your primary sort key; remove the -n option.

1 Like

or:

awk -F, '{t=$2;gsub("[^0-9]","",t);print t, $0}' OFS=, test.csv | sort -n -k1,1 | cut -d, -f2-
1 Like

This is extremely complicated for data that is in a form where a string comparison will work just fine (and, it doesn't solve the problem of redirecting the output wiping out the input). Just using standard sort utility options, the following does everything that was requested:

sort -k2,2 -o test.csv -t, test.csv

Thanks for your helps, balajesuri, Don Cragun and vgersh99! :smiley:

Now I can save my csv file as csv file itself. and personally, i like one line code because it looks simple and not complex. So, i prefer to use Don Cragun's code. I used this code:

sort --field-separator=',' --key=2 -n -o test.csv test.csv

Because when i tried to run this code:

sort -k2,2 -o test.csv -t, test.csv

the output that i got is like this:

/home/intannf/foto/IMG_0666 (1).JPG,2015:02:17 09:59:48
/home/intannf/foto/IMG_0684.JPG,2015:02:17 10:54:51
/home/intannf/foto/IMG_0687.JPG,2015:02:17 10:56:43
/home/intannf/foto/IMG_0739.JPG,2015:02:17 11:32:21
/home/intannf/foto/IMG_0743.JPG,2015:02:17 11:34:26
/home/intannf/foto/IMG_0745.JPG,2015:02:17 11:36:01
/home/intannf/foto/IMG_0746.JPG,2015:02:17 11:37:18
/home/intannf/foto/IMG_0749.JPG,2015:02:17 11:37:28
/home/intannf/foto/IMG_0750.JPG,2015:02:17 11:37:29
/home/intannf/foto/IMG_0752.JPG,2015:02:17 11:37:40
/home/intannf/foto/IMG_0756.JPG,2015:02:17 11:45:21
/home/intannf/foto/IMG_0759.JPG,2015:02:17 11:49:09
/home/intannf/foto/IMG_0762.JPG,2015:02:17 11:49:24
/home/intannf/foto/IMG_0763.JPG,2015:02:17 11:49:45
/home/intannf/foto/IMG_0771.JPG,2015:02:17 11:51:22
/home/intannf/foto/IMG_0777.JPG,2015:02:17 11:53:43
/home/intannf/foto/IMG_0782.JPG,2015:02:17 11:54:16
SourceFile,DateTimeOriginal

the header is written below the data. Would you tell me why it can be like that?

Sorry, I didn't notice the heading line before. Doing a numeric sort, the string DateTimeOriginal evaluates to 0 and sorts before 2015 ; while doing an alphanumeric sort, numbers sort before letters. So, you need the -n option to make the heading come out first. If you use numeric sorting, you need to make the sort key more complex to use the each year, month, day, hour, minute, and second field separate keys to avoid just using the year as your sort key:

sort -n -k2.1,2.4 -k2.6,2.7 -k2.9,2.10 -k2.12,2.13 -k2.15,2.1 -k2.18,2.19 -o test.csv -t, test.csv

But with your data we can use numeric sorting for the year and use alphanumeric sorting for the month, day, hour, minute and second as a single field, which siimplifies the command to just:

sort -k2.1,2.4n -k2.6,2 -o test.csv -t, test.csv

Your relatively simple sort seems to work because the image numbers in your pathnames are monotonically increasing as the time stamp increases AND all of your image files are in a single directory. If your real image files could be in more than one directory, might have different numbers of digits in the sequence numbers, or might come from different cameras with different image ranges, one of the above sort commands should sort correctly by date and time for you.

1 Like

That's because characters sort below digits. And it would surprise me if the outputs of the two commands were different.
To obtain the desired result with the header on top of the data, try

head -1 test.csv; tail -n+2 test.csv | sort -t, -k2
1 Like

Hi, RudiC

Thanks for your help. Would you mind to explain those script to me? I don't understand yet about it. Thanks in advance.

Regards,
Intan

---------- Post updated at 12:36 AM ---------- Previous update was at 12:25 AM ----------

Hi, Don Cragun

Thanks for your help. Would you mind to explain those script to me? I am getting confused about this part:

-k2.1,2.4n -k2.6,2

I think k2 means the second row of the file. Is that correct? But i don't understand the other yet. Thanks in advance.

Regards,
Intan

Have you tried looking at the man page for the sort , head , and tail utilities?

The command:

sort -k2.1,2.4n -k2.6,2 -o test.csv -t, test.csv

sorts using a primary sort key that starts with the 1st character in the 2nd field and ends with the 4th character in the second field treating that text as a numeric value ( -k2.1,2.4n ), using a secondary sort key that starts with the 6th character in the second field and ends with the last character of the second field as an alphanumeric value ( -k2.6,2 ), putting the sorted output into the file test.csv ( -o test.csv ), using comma as the field delimiter ( -t, ), and reading input to be sorted from the file test.csv ( test.csv ).

Using a numeric sort on data that starts with a non-numeric character (in this case, the 1st 4 characters of DateTimeOrigina ) is treated as zero (which sorts before the year in the other lines in your file). This makes the heading sort to the beginning of your file instead of the end of your file without having to split out the 1st line of your file (as RudiC did with head -1 test.csv ) and sorting the remainder of your input file ( head -n+2 test.cv | sort... ).

head -1 prints test's first line - the header.
tail -n+2 prints all lines starting with test's second line, then sort -t, -k2 does the rest. You may want to specify the sort keys in more detail, though.