Warning while sorting : A newline character was added to the end of file

Hi,

I am trying to sort a csv file which has say 10 lines each line having a row size that is upto 30183 no. of COLUMNS (Row length = 30183). There is a LINE FEED (LF) at the end of each line. When I try to sort this file say, based on the second FIELD using the below command,

sort -t ',' +1 -2 file1.dat

there is a warning that appears in the command line
"Warning : A newline character was added to the end of file file1.dat"

Also, there are 10 empty lines that get added to the file file1.dat along with the sorted 10 lines data.
If for a particular line I reduce the row size to say a few 100 COLUMNS (Row size= say, 250) then this warning doesn't appear and no empty line gets added for that particular line.

I am running the script on ksh under AIX server.

Can anyone tell me why this is happening? Is this really because of the huge row length?

Upload the file you're sorting so we can see it.
Did you try using

tr -d '<special-chars, like line feed>' 

before sorting?

I guess the last line does not end with LF (0a hex),
instead there is a sequence of zero-bytes.
Please check with

od -x file1.dat | tail

The sort utility is defined to work on text files. By definition, text files contain lines that are no longer than LINE_MAX bytes (including the line terminating <newline> character. You can check the value of LINE_MAX on your AIX system using the command:

getconf LINE_MAX

The standards require LINE_MAX to be at least 2048 and that is the value supported on many UNIX systems.

I would guess that file1.dat contains lines longer than LINE_MAX bytes and you are seeing the undefined behavior that results when sort is asked to process a file that is not a text file.

You could use cut to split your input files with long lines into groups of files with shorter lines (copying the sort keys into each of the files), sort each of the files in the group, and then use paste to recreate your sorted output into a single file with long lines.

(Note, however, that sort -t "," +1 -2 will sort on the remainder of the input line if the sort key compares equal on some lines. So, if your sort keys are not unique, you need to duplicate enough fields to guarantee that you get the same sort order in all of your split files.)

1 Like

Hi.

There is a perl version of sort at http://cpansearch.perl.org/src/CWEST/ppt-0.14/html/commands/sort/sort.pudge

I have not tried that code, but I generally have had good luck with ppt codes.

Posting a sample of the long-line input files would allow experimentation by responders.

Best wishes ... cheers, drl

1 Like

Hi Don Cragon,
Thanks for the information about LINE_MAX. Wasn't aware of it. However, I would need a simpler method to solve this problem.

Hi drl,
Will check that perl version of sort.