Sorting Date Field with Sort -k :/

SOLVED :

Please post a data sample which contains commas as delimiters. As posted, the delimiter for the data field is a solidus and there are no comma-delimited fields.

I know exactly what assignment this is for so I won't give away too much.

When you're sorting, think about how you're splitting that sort... You're splitting it by the comma, yet you want to sort by date which has a forward-slash delimiting it.

Also, think about what order those details are being placed in birthday.csv it might make things simpler for you...

mike,0208,19/08/1991
nickle,0182,02/07/1960
...

Like that?

Thank you for hasty reply.

---------- Post updated at 12:15 PM ---------- Previous update was at 12:06 PM ----------

As far as I am aware the / has nothing to do with it as I am using the -k wildcard which does it by position.
:confused:

As I said, I know this is an assignment, so if I just give you the answer, we'll both get into trouble and you won't learn anything. I can see you're actually trying (unlike most of our peers), but the fact remains that this is an assignment meant to test our abilities.

Look at what sort has to offer. -n offers a numerical sort (great for dates which have been chopped up...)

As I said previously to you, this is a huge hint now, make things easier for yourself and THINK about the order of your .csv file. Who said those details had to go in that order you've specified...

Yes, you will still need -k, too. But think about what it does.

That's all I'm offering. :stuck_out_tongue:

You were pretty close. In the comma-separated file the date is indeed field number 3.

However to sort to date order we need to sort to Year-Month-Day order (not Day-Month-Year order). Because we don't have a consistent delimiter we are reduced to specifying character positions within field 3.

sort -t',' -k3.7n,3.10n -k3.4n,3.5n -k3.1n,3.2n testData.csv

There may be much more to do depending on how you want to deal with duplicate dates. Think about the order of fields 1 and 2 in that situation.

This is also an exercise in data file design. You can make life so much easier by storing dates in an easy-to-sort and easy-to-match format.

Or, you know, one of the mods just gifts you the answer :smiley:

Lucky boy.

@whyte_rhino
There are rules for posting homework of which the O/P will be aware. Otherwise we assume a commercial environment.
This situation with weird sorts is often reached after poor system design.

On re-reading, I see that you know a filename which did not appear in post #1. Either a tutor or a someone working on the same problem?

Methyl, appreciate the help, however I did have that sort the first time round, starting the years first but I still have the same problem; Could it be my translation messing about with the ',' and '/'s?

Also rhyno, I did try what you said before methyl posted an answer;

I swapped the date to the first part of the file, and then used the delimiter as the / to skip apart however I thought it would but no luck.

So the problem I still have is that the -k only does one part of sorting.
So in this example it has only sorted the last part of testData.csv

Yeah, I was working on the same problem (a couple of weeks back now), but it's due in next week so you're probably going to get a lot of traffic soon :stuck_out_tongue:

This particular issue is probably the most difficult aspect of it, too, which is why I was keen not to just give him the answer.

You're not to know, though, and the support on this forum helped me tremendously.

Again; to clarify, I am not looking for an answer, but rather why my solution isn't working.

Post your new sort code, might as well have a crack it now you have the answer anyway :stuck_out_tongue:

I originally did this:

sort -n -k 2.9 -k 2.5 -k 2 testData.csv

Which is similar to methyl's solution which works off the year first, but the problem I get is it will only sort one part, not all of them.

The original problem persists.

Without seeing the command which went wrong or the matching data or the result ... here is my guess.
If you started with say:

sort -n -t","  -k3.7 -k3.4 -k3.1 testData.csv

Within the date field dd/mm/yyyy :
k3.7 matches yyyy ... okay so far.
k3.4 matches mm/yyyy ... its going wrong
k3.1 matches dd/mm/yyyy ... its gone very wrong
Thus the file actually gets sorted to dd/mm/yyyy order and the first two sort keys become irrelevant. The problem was overlapping sort keys. The fact that you did not specify a numeric sort was actually irrelevant in this circumstance because the order of a character sort would be the same.

(Though our posts crossed, the problem looks the same to me. We need to show where the key stops as well as where it starts if we do not have a suitable delimiter).

Let's think methodically what we want to do here:

I'm assuming I have this data in the csv in this order:

11/12/2001,1234,Paul Stephens

So, I want to sort by number

sort -n 

then I want split that date, thankfully the date has a nice delimiter to use:

sort -n  -t"/" 

So now I can sort the date easily, but as this date is dd/mm/yyyy I need to reverse it to make the sort work properly:

sort -n  -t"/" -k3 -k2 -k1 testData.csv 

You might want to use a function that -o(utputs) to a file...

sort -t',' -k3.7n,3.10n -k3.4n,3.5n -k3.1n,3.2n testData.csv

Yea I understand, and that's why yours specifies the exact locations:
Yet, it still fails... :cry: ... It's really getting on my nerves, but I appreciate your constant support.

---------- Post updated at 01:05 PM ---------- Previous update was at 01:03 PM ----------

In my above post I said I tried that; but it still failed; which is why I am convinced there is something wrong with my client and the ,'s and the /'s .

I am not looking for an answer, I already had one, but something's going wrong. I just can't see it.

_______________________________
_____________________________

EDIT:

Tried various variations of code:

And with this for example:

sort -t',' -k3.7n,3.10n -k3.4n,3.5n -k3.1n,3.2n birthday.csv

Minas,284729,14/05/1956
Bono,92374,08/07/1970
Glads,02883,07/01/1987
Mario,09281,02/17/1987
Boomer,0182,18/11/1988
Jesus,01322,17/03/1991

That's the output: As you can see sorted in third column, not others :cry:

sort -t',' -k3.7n,3.10n -k3.4n,3.5n -k3.1n,3.2n birthday.csv

Minas,284729,14/05/1956
Bono,92374,08/07/1970
Glads,02883,07/01/1987
Mario,09281,02/17/1987
Boomer,0182,18/11/1988
Jesus,01322,17/03/1991

Imho. I can't see anything wrong with this output. Assuming the test data is meant to be dd/mm/yyyy (despite month "17" in the middle column!), the data is in the correct sorted order - ascending yyyy/mm/dd.
Try a bigger data file with a decent variation of dates (I did). Your test data only has one case where the dates are in different years.

1 Like

Ok Methyl, thank you for your help; that last post was most helpful as you actually pointed out to me that I was looking at it all wrong; stupid of me, but I was somehow expecting it to sort by date, and then move all the months in order and dates, which is impossible without altering. So thanks for pointing it out, one of the first sets of code I used worked, so thanks for pointing out something staring in my face.