Processing of log files

Hi,

I have typical logs file something of these formats -

fn2013.12.13.log
fn2013.12.13_a.log
fn2013.12.13_b.log

suffix part is after the underscore ( ie a.log or b.log )

I need to process the files in ascending date order, but descending suffix order, and check if the file exist and is greater than zero, and on failure an appropriate message should be displayed like "file is of zero size".

Coming from just a normal Unix knowledge background, it is difficult for me to process the file based on the sorting logic as described above. Could someone help me on this. ?

I will really appreciate if someone could help me this or guide me through.

Thanks.

How would above three sample files be sorted? Where would the no-suffix file fn2013.12.13.log end up?

Thanks for the reply!

Well if it does have the suffix, it will be just sorted out of date - the first parameter. I need to pull date ( 2013.12.13 ) and suffix ( a.log or b.log ) -- if present, from the file names.

That doesn't answer the question... If you have the following files:

fn2013.12.13.log
fn2013.12.13_a.log
fn2013.12.13_b.log
fn2013.12.14.log
fn2013.12.14_a.log
fn2013.12.15.log
fn2013.12.16.log
fn2013.12.17.log
fn2013.12.17_a.log
fn2013.12.17_b.log
fn2013.12.17_c.log

exactly what are you trying to do with these files? And, exactly what output(s) are you trying to produce with this set of files?

Hi Don,

Basically, the requirement is to process the files in ascending date

2013.12.13

order, but descending suffix order

a.log

, and for every date and suffix, we need to check if the file exists and is greater than 0 size. Also, on failure, we need to output an appropriate message.

So far, I have tried this, but does give me expected sorted result, could someone point where I am doing the mistake :

for file in `cat file.txt | sort -k1,1nr -k2,2 -t _`; do echo $file; done

# Here is the file.txt :

ski2015.12.15.12.30.23.log
ski2015.12.15.12.30.24_a.log
ski2015.12.15.12.30.25_b.doc
ski2015.12.15.12.30.25_b.doc
ski2015.12.15.12.30.25_b.log
ski2015.12.15.12.30.25_b.log
ski2015.12.15.12.30.25_b.log
ski2015.12.14.12.30.24_a.log
ski2015.12.14.12.30.25_b.doc
ski2015.12.14.12.30.25_b.doc
ski2015.12.14.12.30.13_b.log
ski2015.12.14.12.29.25_b.log
ski2015.12.14.12.30.25_b.log
ski2015.12.14.12.30.23.log
ski2015.12.14.12.29.25_a.log
ski2015.12.15.12.30.23_a.log
ski2015.12.14.12.29.25_b.log
ski2015.12.14.12.29.25_c.log

Thanks.

This thread is titled "Processing of log files".

I don't want "Basically, the requirement is to process...". I gave you an explicit list of .log files in post #4 and asked what output you want when given that list of .log files.

Instead of answering that question you showed us an unsorted list of .log and .doc files (with some of those of files included in the list twice). Are .doc files now considered log files?

How did you get a list of files with duplicate names in that list? If you ran a find command to produce a combined list of files from multiple directories and stripped off the directory names, how do you expect a script to be able to guess at which directory a file came from to determine its size?

You have said you want to process the files with a suffix a.log and b.log and you imply that a file named ski2015.12.15.12.30.23.log doesn't have a suffix because there is no _ in that name. Please take the list I gave you or the list you gave us in post #5 and show us the EXACT output that you want to produce from that list (and explain why some files, if there are any, are not included in the output).

It is easy to see that a sort that sorts a list in reverse numeric order (when the field being sorted is not numeric) and for matching fields on that key sorts in increasing alphanumeric order on a second key doesn't give you a list in the order you have described. What is not easy is trying to guess what files are supposed to be excluded from your output and the order you do want for files that are not excluded if some of those files do not contain an _ character.

Why do you need a list of "log" files other than that produced by letting the shell expand *.log as a filename matching pattern in the directory where these files are located? Or, if you only want files with a suffix (as you have described it), why not just use *_*.log to get the list?

1 Like

The structure of the entries in your file.txt doesn't fit those in post#1, and it doesn't represent dates. It vaguely reminds me of date/time entries; should the time be considered in the sort?

Please be way more specific and precise; post a sample output that you need!

Hi Don/Rudi,

I regret if I could not present it properly, as there are couple of file types which appears until third character of every files ( ie fn, ski etc ).

Here is my sample file :

sn2015.12.14.12.29.25_a.log
sn2015.12.14.12.29.25_c.log
sn2015.12.14.12.29.25_b.log
sn2015.12.14.12.29.25.log
sn2015.12.14.12.30.13_b.log
sn2015.12.14.12.30.23.log
sn2015.12.14.12.30.24_a.log
sn2015.12.14.12.30.25_b.log
sn2015.12.15.12.30.23_a.log
sn2015.12.15.12.30.23_a.log
sn2015.12.15.12.30.23.log
sn2015.12.15.12.30.24_b.log

#Expected Output

sn2015.12.14.12.29.25.log
sn2015.12.14.12.29.25.a.log
sn2015.12.14.12.29.25.b.log
sn2015.12.14.12.29.25.c.log
sn2015.12.14.12.30.23.log
sn2015.12.14.12.30.13_b.log
sn2015.12.14.12.30.24_a.log
sn2015.12.14.12.30.25_b.log
sn2015.12.15.12.30.23.log
sn2015.12.15.12.30.23_a.log
sn2015.12.15.12.30.24_b.log

Again thank you for your responses.

How come sn2015.12.14.12.30.13_b.log is sorted below sn2015.12.14.12.30.23.log ? What is the suffix in sn2015.12.14.12.29.25.a.log ?

Do you really expect a reasonable solution based on a dubious specification as given in several posts even though clear questions were asked?

My Bad -

Here is the expected output :

sn2015.12.14.12.29.25.log
sn2015.12.14.12.29.25_a.log
sn2015.12.14.12.29.25_b.log
sn2015.12.14.12.29.25_c.log
sn2015.12.14.12.30.23.log
sn2015.12.14.12.30.13_b.log
sn2015.12.14.12.30.24_a.log
sn2015.12.14.12.30.25_b.log
sn2015.12.15.12.30.23.log
sn2015.12.15.12.30.23_a.log
sn2015.12.15.12.30.24_b.log

Sorry - I can't help. I came up with a solution but it doesn't match your expected output.

OK. So your sample input contains duplicates. (And you refuse to tell us why.) Getting rid of duplicates is easy.

You tell us that dates are to be in ascending order and suffixes are to be in descending order, but your expected output above has dates in a seemingly random order (assuming that the timestamp is part of what you are calling the date) and suffixes in increasing order (not decreasing). If the timestamp is to be ignored when sorting the data, then the suffices you say you want are in random order in your list above.

And, you say: "I regret if I could not present it properly, as there are couple of file types which appears until third character of every files ( ie fn, ski etc )." Is this supposed to mean something in the way the output is to be sorted??? You have not shown us any input or output that has more than a single sequence of two or three letters that appear in every file in each list you have shown us.

I am sorry, but I am unable to ascertain any pattern in the output you want. And, even if we assume that the line marked in red above is a typo, the output you have shown us doesn't come close to what you said you want.

I give up.

So can I calrify your requirements? For each numerical section on the file name, you want to sort ascending and then for each suffix you want to sort descending, as in from z to a? That doesn't match your output as far as I can see (although I haven't scrutinised it too closely.

Could you clearly paste (not just make up) some sample input and then manually work out the required output getting all the possibilities ironed out.

It might be possible with a single sort command specifying the numeric bit as a primary sort ascending and the suffix as a secondary descending if that's what you need.

Can you tell me the order that these files should process that you listed recently?

sn2015.12.14.12.29.25.log
sn2015.12.14.12.29.25_a.log
sn2015.12.14.12.29.25_b.log
sn2015.12.14.12.29.25_c.log
sn2015.12.14.12.30.23.log
sn2015.12.14.12.30.13_b.log
sn2015.12.14.12.30.24_a.log
sn2015.12.14.12.30.25_b.log
sn2015.12.15.12.30.23.log
sn2015.12.15.12.30.23_a.log
sn2015.12.15.12.30.24_b.log

It seems that these are sorted, but with the suffix being ascending as well (a to z) so it's just a simple sort without parameters.

sort file.txt

Indeed if this file file.txt is built from the output from ls in some way, I'm wondering why your output is not sorted already. Does this work:-

:
:
for file in fn*.log       # Specify matching pattern, i.e. exclude .doc files
do
    : # process $file
done
:
:

If there are too many files and the command fails for being too long, you could try:-

:
:
for file in `find . -name "fn*.log" | sort`
do
   : # process $file
done
:
:

You need to be very clear about your need else we're just hoping that your "My car doesn't work" type description might be all sorts of serious things, or perhaps you are out of fuel.

We need accurate input to work on and meticulously checked required output to be certain we are going to be helpful and using our time well - we do all give it freely.

I do hope that we can still help you.

Robin