Need to sort text keeping first line always first

I have a file is created from standard output.

I have put a leading space to force the first line to collate low vis a vis the rest of the lines.

If I pass the entire file to the Linux sort, it ignores the leading space and the first line appears in somewhere in the list.

If I add lots of leading spaces, it does not help.

I was thinking of putting leading !!!, which collates lower than the alphabet, and then running sed thereafter to eliminate the !!!.

Is that the best way or are there better ways?

Without knowing more about your data and the sort keys you're using to sort the data, it is hard to make suggestions about what might be the best way.

Your

locale

might be non-ASCII.
Try

LC_ALL=C sort ...

or

LC_COLLATE=C sort ...

Hi,
There was some version of "gnu sort" with a bug of '-b' option was enabled by default.
If you can not do otherwise than change your line with '!!!' , maybe this example will help you:
File example:

$ cat file
zzz
cat big 24
cat small   13
cat red 63
dog big 34
chicken plays   39
fish    red 294

The command that sort file without first line:

$ (head -1 file;sort <(sed -n '2,$p' file)) >file2

Resultat:

$ cat file2
zzz
cat big 24
cat red 63
cat small   13
chicken plays   39
dog big 34
fish    red 294

I have a file consisting of fixed columns
title showing from where the file originates
lines containing

progname|date|sha1 checksum|path

I want to sort by progname but to keep the top line.
The top line had

!!!Date=yymmdd, ...system = ssssssssss

I have tried to set very low print values in the leading columns of the title, but that did not work (Fedora Linux 20). I am not sure that -b -d or -g works for the sort. (I could not detect differences).
I would just want a binary sort over the first 33 characters of the file. sort -k will limit the sort to those columns but it ignored the !!! and therefore the line with date=yyyy
appeared in line.

I can upload a small sample unsorted file, if that helps.

Perhaps it would be easiest to just exclude the first line from sorting, for example:

{ IFS= read -r line; printf "%s\n" "$line"; sort;} < file > file.sorted

-edit
Just saw disedorgue posted something similar. This could be an alternative then. Just add sort options ad libitum..

Try just having your first line in your input file have a <space> as the 1st character on the line and sort with the command:

sort -t '|' 

Specifying a non-space field separator should make the leading space significant.

If you don't want a visible space at the start of the first line, make the first two characters in the file be a <space> character followed a <backspace> character.

Thanks for the reminder. I actually had " \bHost = xxx, topdir=ddd etc"

but without the -t the sort interprets the line, so that it saves it as Host=xxx ....

In 20 minutes, I will do the test as you suggested. The only change I have to do is test the sort parameters.

---------- Post updated at 07:23 PM ---------- Previous update was at 07:08 PM ----------

This is the top part of my file. Notice the !!^HHost= (^H=Backspace)

1to8.txt                         |20140223 0056|25d55ad283aa400af464c76d713c07ad|/home/leslie/Development/scandirFeb23/md5-c
alpha.txt                        |20140223 0056|83e065ac9ed97eca51391c20e9671373|/home/leslie/Development/scandirFeb23/md5-c
a.txt                            |20140223 0056|933222b19ff3e7ea5f65517ea1f7d57e|/home/leslie/Development/scandirFeb23/md5-c
crc32.c                          |20140223 0056|4d7a5dbb246898ff9d3ba19c0ded7f5b|/home/leslie/Development/scandirFeb23
crc32.h                          |20140223 0056|c15674694592358889120712db73be69|/home/leslie/Development/scandirFeb23
crc32.o                          |20140301 1912|10a49aede5f82d00205c1f89a8931731|/home/leslie/Development/scandirFeb23
DATE1                            |20140223 0056|e0167034133516d3ad5d61a09bae8156|/home/leslie/Development/scandirFeb23
DATE2                            |20140223 0056|e606fe0237c786174d2087090f81644a|/home/leslie/Development/scandirFeb23
daycalc.c                        |20140223 0056|1dd882b48e5c156748aba7fb38dbba51|/home/leslie/Development/scandirFeb23
dirdepth                         |20140301 1912|9f2ff1bd8b133ca0de8d124ad7d761d2|/home/leslie/Development/scandirFeb23
dirdepth.c                       |20140223 0056|a7c3f1c02245aec9a1b651e11018ff82|/home/leslie/Development/scandirFeb23
dirent.h                         |20140223 0056|1906fd554bf036fdf6ffd0b054ca321d|/home/leslie/Development/scandirFeb23
empty.txt                        |20140223 0056|d41d8cd98f00b204e9800998ecf8427e|/home/leslie/Development/scandirFeb23/md5-c
gcc.txt                          |20140223 0056|b8917c1a087abbf74f0294dad9cbf698|/home/leslie/Development/scandirFeb23
!! Host=Fedora20.Bachelor           |          |                       scan from| ^H/home/leslie/Development/scandirFeb23
inih_r27.tar                     |20140223 0056|a8da6db331c8fe638cbb8c6940ce303e|/home/leslie/Development/scandirFeb23
inih_r28Dec16.00.tar             |20140223 0056|6fe6356f0ba2e501c2713958f119d493|/home/leslie/Development/scandirFeb23
itcrftn.c                        |20140223 0056|b1f1444cfdc35b6427ad3b002a176e9f|/home/leslie/Development/scandirFeb23
log.txt                          |20140223 0056|187c3fdbde0febf71257f0c0da9e21e7|/home/leslie/Development/scandirFeb23/md5-c
makefile                         |20140223 0056|26955e927da56d1343af738a247b87e1|/home/leslie/Development/scandirFeb23/md5-c
makefile                         |20140301 1912|02c8266eb8c3d3b52eabb30378ef9895|/home/leslie/Development/scandirFeb23
md5       

Is the sort program too smart. I used '!' to collate low so that my title line stayed first. I tried it with ' ' when piped to the sort as ... | sort -t '|' > x

With the data shown in message #4 in this thread in a file named input.txt , I get the following data saved in the file named x from the command:

sort -t '|' -o x input.txt
!! Host=Fedora20.Bachelor           |          |                       scan from| ^H/home/leslie/Development/scandirFeb23
1to8.txt                         |20140223 0056|25d55ad283aa400af464c76d713c07ad|/home/leslie/Development/scandirFeb23/md5-c
DATE1                            |20140223 0056|e0167034133516d3ad5d61a09bae8156|/home/leslie/Development/scandirFeb23
DATE2                            |20140223 0056|e606fe0237c786174d2087090f81644a|/home/leslie/Development/scandirFeb23
a.txt                            |20140223 0056|933222b19ff3e7ea5f65517ea1f7d57e|/home/leslie/Development/scandirFeb23/md5-c
alpha.txt                        |20140223 0056|83e065ac9ed97eca51391c20e9671373|/home/leslie/Development/scandirFeb23/md5-c
crc32.c                          |20140223 0056|4d7a5dbb246898ff9d3ba19c0ded7f5b|/home/leslie/Development/scandirFeb23
crc32.h                          |20140223 0056|c15674694592358889120712db73be69|/home/leslie/Development/scandirFeb23
crc32.o                          |20140301 1912|10a49aede5f82d00205c1f89a8931731|/home/leslie/Development/scandirFeb23
daycalc.c                        |20140223 0056|1dd882b48e5c156748aba7fb38dbba51|/home/leslie/Development/scandirFeb23
dirdepth                         |20140301 1912|9f2ff1bd8b133ca0de8d124ad7d761d2|/home/leslie/Development/scandirFeb23
dirdepth.c                       |20140223 0056|a7c3f1c02245aec9a1b651e11018ff82|/home/leslie/Development/scandirFeb23
dirent.h                         |20140223 0056|1906fd554bf036fdf6ffd0b054ca321d|/home/leslie/Development/scandirFeb23
empty.txt                        |20140223 0056|d41d8cd98f00b204e9800998ecf8427e|/home/leslie/Development/scandirFeb23/md5-c
gcc.txt                          |20140223 0056|b8917c1a087abbf74f0294dad9cbf698|/home/leslie/Development/scandirFeb23
inih_r27.tar                     |20140223 0056|a8da6db331c8fe638cbb8c6940ce303e|/home/leslie/Development/scandirFeb23
inih_r28Dec16.00.tar             |20140223 0056|6fe6356f0ba2e501c2713958f119d493|/home/leslie/Development/scandirFeb23
itcrftn.c                        |20140223 0056|b1f1444cfdc35b6427ad3b002a176e9f|/home/leslie/Development/scandirFeb23
log.txt                          |20140223 0056|187c3fdbde0febf71257f0c0da9e21e7|/home/leslie/Development/scandirFeb23/md5-c
makefile                         |20140223 0056|26955e927da56d1343af738a247b87e1|/home/leslie/Development/scandirFeb23/md5-c
makefile                         |20140301 1912|02c8266eb8c3d3b52eabb30378ef9895|/home/leslie/Development/scandirFeb23
md5

The order of lines in the output is the same when the line containing Host= is:

!! Host=Fedora20.Bachelor | | scan from| ^H/home/leslie/Development/scandirFeb23
 ^HHost=Fedora20.Bachelor | | scan from| ^H/home/leslie/Development/scandirFeb23
        or
 Host=Fedora20.Bachelor | | scan from| ^H/home/leslie/Development/scandirFeb23

The system I'm using for this test is Mac OS X 10.7.5 running on a MacBook Pro laptop. This is the output I would expect for any sort utility conforming to the POSIX standards.

Note that the <space><backspace> before: /home/leslie/Development/scandirFeb23 in that line doesn't matter unless you're sorting on that field with something like:

sort -t '|' -k4 -o x input.txt

I ran sort as you indicated and it worked.

I have tried to understand why you used -k4

I was trying to sort by the first column. And for all that I tried,
the Host=Fedora .... was placed somewhere in the middle of the output file.

If I sorted on fields 2,3,4, the sort yields what I require.

If I did not specify a "-b" with the sort, it should assume the leading blanks are part of the field and should not be skipped over
.

Thank for your patience and help.

I'm glad that it worked for you.

In the input sample you provided the header line was:

!! Host=Fedora20.Bachelor | | scan from| ^H/home/leslie/Development/scandirFeb23

What I was saying was that the <space><backspace> (marked in red above) doesn't make any difference unless you're sorting on the 4th field instead of the 1st field. I didn't understand why those characters were present in your sample input.

Yes. You are correct. If sort on your system behaved as specified by the standards, the -t option should not be needed in this case. I suggested using the -t option because of disedorgue's comment in message #4 in this thread:

Since there is an interaction between the -b and -t options (although it isn't as clearly specified in the Linux sort(1) man page as it is in the POSIX sort utility man page), I thought that if your version of sort did have this bug, using the -t option might provide a work around.

I guess I was bleary eyed last night when I indicated that everything was ok, but ....
Sorting for columns 2 through 4 works fine.

How do I make the sort work for column 1?
Do I need to add a leading | symbol before column 1?

Please refer to the list I posted yesterday, a few messages back.
Is the problem me or the sort's limitations?

I assume that you're referring to the following which is from message #8 in this thread:

Note the two sections marked in red. Your description of the sample input shows !!\bHost= (where \b is the backspace character) but the data in the file shows !! Host= (with a space instead of a backspace). Note that if that file did contain a backspace before Host= instead of a space, then sorting that file using the command:

sort -fd input_file

would produce output exactly matching the input you showed us.

Please upload the exact sample input_file you're using, show us the exact command line you're using to sort that file, the exact output you're getting from that command line, and the exact output you're trying to get.

Unless you want exclamation points in the header in your output, please use a space followed by a backspace as the 1st two characters on the header line instead of two exclamation points followed by a space.

Hi Don

Thank you for your posting. I too, discovered the -f option this evening and then I read your response when I came to post my finding.

The data you see above had many many iterations to try to get it to work to my requirements. My original output was produced in the printf statement beginning printf( " \bHOST=%s... ..); (one space and one backspace before the H)

From what I understood, the sort command, if issued against x.raw, the file to be sorted, with the following comand line

   sort -k1 -t '|' -o x.sorted  x.raw 

should keep the first line invariant, but it does not. It appears to require the -f to almost meet my needs.

The manual states that the -f was to fold upper to lower case together to lowercase.

I really was after the ascii collating sequence. Ergo With the -f option, Date1 and Date2 are in the wrong place, but the first line is maintained as was desired.

Is it possible that the sort is missing an option to "just sort a column", purely respecting the ascii contents of the field?

If the above answer is no, then if it was up to me, I would request a -e parameter (when used with -t ). It would be used to stop interpretation of leading blanks and non-alpha characters.

In closing, thanks for your help and for the others in the forum who responded.

You did not upload your input file as I requested.

You did not tell us what operating system and version you're using as I requested.

You did not show us the output you are trying to get as I requested.

If your input file is encoded using ASCII and your locale has a collating order that matches ASCII unsigned character ordering, the simple command:

sort -o x.sorted x.raw

should do what I think you're trying to do. (And, on Mac OS X, it does.)

If the command:

sort -o x.sorted x.raw

or the command:

LC_ALL=C sort -t '|' -k1,1 -o x.sorted x.raw

doesn't produce the output:

Host=Fedora20.Bachelor           |          |                       scan from|/home/leslie/Development/scandirFeb23
1to8.txt                         |20140223 0056|25d55ad283aa400af464c76d713c07ad|/home/leslie/Development/scandirFeb23/md5-c
DATE1                            |20140223 0056|e0167034133516d3ad5d61a09bae8156|/home/leslie/Development/scandirFeb23
DATE2                            |20140223 0056|e606fe0237c786174d2087090f81644a|/home/leslie/Development/scandirFeb23
a.txt                            |20140223 0056|933222b19ff3e7ea5f65517ea1f7d57e|/home/leslie/Development/scandirFeb23/md5-c
alpha.txt                        |20140223 0056|83e065ac9ed97eca51391c20e9671373|/home/leslie/Development/scandirFeb23/md5-c
crc32.c                          |20140223 0056|4d7a5dbb246898ff9d3ba19c0ded7f5b|/home/leslie/Development/scandirFeb23
crc32.h                          |20140223 0056|c15674694592358889120712db73be69|/home/leslie/Development/scandirFeb23
crc32.o                          |20140301 1912|10a49aede5f82d00205c1f89a8931731|/home/leslie/Development/scandirFeb23
daycalc.c                        |20140223 0056|1dd882b48e5c156748aba7fb38dbba51|/home/leslie/Development/scandirFeb23
dirdepth                         |20140301 1912|9f2ff1bd8b133ca0de8d124ad7d761d2|/home/leslie/Development/scandirFeb23
dirdepth.c                       |20140223 0056|a7c3f1c02245aec9a1b651e11018ff82|/home/leslie/Development/scandirFeb23
dirent.h                         |20140223 0056|1906fd554bf036fdf6ffd0b054ca321d|/home/leslie/Development/scandirFeb23
empty.txt                        |20140223 0056|d41d8cd98f00b204e9800998ecf8427e|/home/leslie/Development/scandirFeb23/md5-c
gcc.txt                          |20140223 0056|b8917c1a087abbf74f0294dad9cbf698|/home/leslie/Development/scandirFeb23
inih_r27.tar                     |20140223 0056|a8da6db331c8fe638cbb8c6940ce303e|/home/leslie/Development/scandirFeb23
inih_r28Dec16.00.tar             |20140223 0056|6fe6356f0ba2e501c2713958f119d493|/home/leslie/Development/scandirFeb23
itcrftn.c                        |20140223 0056|b1f1444cfdc35b6427ad3b002a176e9f|/home/leslie/Development/scandirFeb23
log.txt                          |20140223 0056|187c3fdbde0febf71257f0c0da9e21e7|/home/leslie/Development/scandirFeb23/md5-c
makefile                         |20140223 0056|26955e927da56d1343af738a247b87e1|/home/leslie/Development/scandirFeb23/md5-c
makefile                         |20140301 1912|02c8266eb8c3d3b52eabb30378ef9895|/home/leslie/Development/scandirFeb23
md5

(which is what I think you're trying to get) when the 1st two characters of the first and last fields on the 1st line are <space><backspace>, I don't know what else to suggest. Both of the above commands produce this output when using sort on Mac OS X and should produce the same output on any system with a sort utility that conforms to the standards.

Are you sure that you don't have an alias in place for sort that is adding options you don't want in this case? (What output do you get from the command: type sort ?)

To All

I tried each of the examples you presented and I want to thank you for the help.

You asked for an uploaded file, and one is attached.

To upload, I had to change a filename from x.raw to x.txt

The line I confirmed worked on my system was the one proposed. It is the

LC_ALL=C sort -t '|' -k1,1 -o x.sorted x.txt

and it works. I am delighted.

I was not aware of how to specify the LC_ALL=C or its significance.

Here is the slightly larger file I was using (from the same population)

To upload, as I mentioned, I had to change the upload filename to x.txt

I intend to put the scanner into the public FOSS domain.

The purpose of the scanner is to obtain a checksum (md5 or sha1, or crc32) of all files beneath a given directory.

A blacklist facility to omit some subdirectories is part of the scanner.

Why do this? Auditors where I was working wanted to detect changes in production files (ERP object or source code). The scanner would be run daily (auditors did not require realtime notices), they wanted proof that any software implementation followed corporate implementation rules.

This software was developed on my own computers in my home.

Want a source copy? Just ask for one. Want to collaborate to extend it? ditto.

Thank you all,

Leslie

PS. My response was late. Your messages were filtered into SPAM. I check SPAM every few days and that is why today, now, I am responding.