DOS/Windows CR to a UNIX LF 17 MB text file

Hello,

I am on a WinXP home machine with a 17 MB text file and I need to change the DOS/Windows CR to a UNIX LF.

Does anyone know how I can do this or even better a WinXP program that can do this for me ?

My hobby is my Family History. I know very little about all this type stuff and I need an expert to offer me some help.

I wish to use this windows / DOS program called IGM see here Doug & Anthea Macdonald Homepage to create my web pages on the fly from a GEDCOM file. This GEDCOM is nothing but a .txt file in a special format to meet the GEDCOM standard.

I need to find all Windows Carriage-Return (CR = ASCII 13) and replace them with the UNIX (LF = ASCII 10) at the end of each line in the place of the Windows CR.

I need to do this because this file will reside on a Apache server using UNIX and .CIG script will be ran on this file.

The complier of this program says I will need to change the CR to LF but the program he has provided will not work or at least I can't get it to do anything for me.

The program seems to pre-date XP and I can't get in touch with anyone that complied this program. here is a link to the freeware program IGM Download Information

This is what it said in the how to file included with the IGM program.

"Finally, another difference between DOS/Windows and UNIX is that they denote the end of a line of text within a file in different ways. UNIX puts a Line-Feed character (LF = ASCII 10) at the end of each line, whereas DOS/Windows ends each line with the two-character combination Carriage-Return (CR = ASCII 13) and Line-Feed. Files and scripts with the DOS convention will be uninterpretable when they are uploaded to the UNIX server. Therefore, a small program called d2u.exe is included with the IGM programs. This file, which runs on a PC, removes the CR characters from the end of each line. You should run this program on any file you are going to upload just before uploading it. To run it, just type e.g.

 d2u igmget.cgi 

at the DOS prompt. Run this file on all of the CGI scripts, plus the GEDCOM file, and any other files you upload. Since this process makes permanent changes to the files which may make them uneditable on your home computer, it is strongly suggested that you make copies of all the files in your special pre-upload directory before performing this step, and make the changes on these copies. Note that whenever you upload a new GEDCOM file, you will have to run d2u.exe on it first. "

I have ran that every way I can think of from the Run command in XP and have even rebooted into safe mode with a command line prompt and nothing.

I can't even tell what the end of line is in this file so how will I know if I get it changed ?

Thanks !

Tex & Linda Dix - Dick
305 Avalee Dr.
Brooks, GA 30205
tex@dixhistory.com
Tex and Linda Dix or Dick Family History

The provided utility mau have worked correctly, why don't you upload the files and see if they work correctly ? Another possibility is when you upload a file, to use 'dos2unix' utility on the remote server, if you have shell access.

Thanks for your reply SYSGATE.

I have uploaded the files but can't get it to work. I think I have other issues as I don't know what shell acccess is. I use a WS_FTP95 to upload files. I am past 60 and can't type and that type stuff so to let you know I am slow in all areas.

My domain and web pages are hosted by hostmonster and they use C-Panel but I know less about it than I know about Windows and UNIX.

I know the user guide talks about running stuff from the command line but I have no clue how I can do that using Windows XP and a FTP program. See my real concern at bottom and may well be all that I need do to get this thing up and running.

I put all the fies in the CIG-BIN folder that he said and I created a folder named perl as my C-panel said I needed for perl script and put the files needed in that location along with my GEDCOM file. Transfered using ASCII only mode.

I also point everything to my server side and web address as it said or at least I think I did. I am going to go back over it all again this week-end.

The how to file then says if I have no telnet or shell I can do this:
" THAT SHOULD BE ALL THERE IS TO IT !!!

Although the above procedure assumed that you had TelNet access to your account, this is not really necessary. All of the setup can be done through FTP alone, with only a little more difficulty. All of the directory creations and file permission settings performed above can be done with any reasonably full featured FTP program, for instance WS_FTP from Ipswitch. And the IGMLivng and IGMMak scripts can be run via http requests by simply renaming them as ordinary CGI scripts and uploading them to your cgi-bin directory. Since they take single arguments, they can be run with a command like: http://domain/~you/cgi-bin/igmmak.cgi?Macdonald "

SYSGATE do you know from where I would issue that command and or how ?

I think that might be the real hold up as it seems I have all the other stuff done. I am sure about the GEDCOM not sure about the LF but as I understand it noiwdays the FTP program set to ASCII should format it correctly. Only thing I can't see how to do is issue that command on the GEDCOM file from HostMonster, FTP or my web browser.

Thanks again for your help.

Tex-

Hi tex, unfortunately I'm not aware of the software you are using, the GEDCOM, I'd suggest that you speak with your hosting provider tech support.

In fact transferring the file in ASCII mode should handle the line ending conversion during the transfer for you. It might (but most likely won't) make other changes, too, if the file contains special characters, but if as you say it's basically ASCII text, then transferring it in ASCII mode is really all it takes.

There are various tools to look at the raw bytes in a file; one of the purposes of a hex editor is to be able to inspect the precise bytes in a file so you can spot e.g. line ending anomalies. The control character ctrl-J is called a "line feed" and is used to end a line on Unix systems (and thus on the hosting account you are using) whereas on legacy DOS-based systems you use two characters, a sequence of ctrl-M (carriage return) and line feed. In a hex editor, they will show up as 0D and 0A, respectively.

Here's a hex dump of a fragment of text just to show you an example. You can see how each pair of hexadecimal (base-16) digits on the left correspond to one ASCII character on the right; for example, hex 65 is lower case "e".

54 68 65 20 63 6f 6e 74 72 6f 6c 20 63 68 61 72  The control char
61 63 74 65 72 20 63 74 72 6c 2d 4a 20 69 73 20  acter ctrl-J is 
63 61 6c 6c 65 64 20 61 20 22 6c 69 6e 65 20 66  called a "line f
65 65 64 22 20 61 6e 64 20 69 73 20 75 73 65 64  eed" and is used

The convention to use hexadecimal (base 16) instead of the familiar base 10 (decimal) is a convenience; it means that all possible byte values can be represented with exactly two digits, and important "computer" numbers -- factors of two -- are easy to spot. Character codes below 32 (hex 20, the space character) are conventionally called "control characters"; this goes way back to the early formation of character sets in the 1950s and ASCII in the 1960s.

Another thing: You are consistently referring to "cig bin" but the correct path is usually cgi-bin, where CGI stands for Common Gateway Interface. Maybe it's as simple as having misspelled the folder name.

I wish to thank everyone who has helped by taking the time to make a reply.

era was right on both of his posts. I did misspell cig-bin and the transfer in ASCII must have worked because the program ran.

One other thing, that command to run http://domain/~you/cgi-bin/igmmak.cgi?Macdonald just type it into your web browesr address window with your own stuff ...MAKE sure and spell everything right, with correct case. Hit enter or the go button... it did its thing. :slight_smile:

Was it really so simple ? :slight_smile: I'm glad the issue is resolved, nice catch Era.
And yes, Unix / Linux based hosting plans are case sensitive.

I guess I should take note of this bit of info. Now I have some productive use for my hex editor other than just debugging; I guess it would be useful to replace standard UNIX line-feed characters with DOS carriage-return + line-feed combos and vice versa from within my hex editor when I transfer text files among my computers; I've had issues in the past with text files from other operating systems, and maybe this is the cause. So thanks, y'all.