Ok, I think ahamed's observation is true. The files are not true ASCII text files.
So I downloaded both files in my Windows machine and used Cygwin Bash to prod around. Here's what I see:
$
$
$ # print the first line of file1.txt
$
$ head -1 file1.txt
c h r _ n a m e c h r _ s t a r t c h r _ e n d r e f _ b a s e a l t _ b a s e h o m _ h e t s n p _ q u a l i t y t o t _ dqa
$
$
$
The first character looks quite unusual, and there's a space between each character e.g. "c" + space + "h" + space + "r" instead of "chr".
The octal dump of the first line shows this:
$
$ # octal dump of the first line of file1.txt
$
$ head -1 file1.txt | od -bc
0000000 377 376 143 000 150 000 162 000 137 000 156 000 141 000 155 000
377 376 c \0 h \0 r \0 _ \0 n \0 a \0 m \0
0000020 145 000 011 000 143 000 150 000 162 000 137 000 163 000 164 000
e \0 \t \0 c \0 h \0 r \0 _ \0 s \0 t \0
0000040 141 000 162 000 164 000 011 000 143 000 150 000 162 000 137 000
a \0 r \0 t \0 \t \0 c \0 h \0 r \0 _ \0
0000060 145 000 156 000 144 000 011 000 162 000 145 000 146 000 137 000
e \0 n \0 d \0 \t \0 r \0 e \0 f \0 _ \0
0000100 142 000 141 000 163 000 145 000 011 000 141 000 154 000 164 000
b \0 a \0 s \0 e \0 \t \0 a \0 l \0 t \0
0000120 137 000 142 000 141 000 163 000 145 000 011 000 150 000 157 000
_ \0 b \0 a \0 s \0 e \0 \t \0 h \0 o \0
0000140 155 000 137 000 150 000 145 000 164 000 011 000 163 000 156 000
m \0 _ \0 h \0 e \0 t \0 \t \0 s \0 n \0
0000160 160 000 137 000 161 000 165 000 141 000 154 000 151 000 164 000
p \0 _ \0 q \0 u \0 a \0 l \0 i \0 t \0
0000200 171 000 011 000 164 000 157 000 164 000 137 000 144 000 145 000
y \0 \t \0 t \0 o \0 t \0 _ \0 d \0 e \0
0000220 160 000 164 000 150 000 011 000 141 000 154 000 164 000 137 000
p \0 t \0 h \0 \t \0 a \0 l \0 t \0 _ \0
0000240 144 000 145 000 160 000 164 000 150 000 011 000 144 000 142 000
d \0 e \0 p \0 t \0 h \0 \t \0 d \0 b \0
0000260 123 000 116 000 120 000 011 000 144 000 142 000 123 000 116 000
S \0 N \0 P \0 \t \0 d \0 b \0 S \0 N \0
0000300 120 000 061 000 063 000 061 000 011 000 162 000 145 000 147 000
P \0 1 \0 3 \0 1 \0 \t \0 r \0 e \0 g \0
0000320 151 000 157 000 156 000 011 000 147 000 145 000 156 000 145 000
i \0 o \0 n \0 \t \0 g \0 e \0 n \0 e \0
0000340 011 000 143 000 150 000 141 000 156 000 147 000 145 000 011 000
\t \0 c \0 h \0 a \0 n \0 g \0 e \0 \t \0
0000360 141 000 156 000 156 000 157 000 164 000 141 000 164 000 151 000
a \0 n \0 n \0 o \0 t \0 a \0 t \0 i \0
0000400 157 000 156 000 011 000 144 000 142 000 123 000 116 000 120 000
o \0 n \0 \t \0 d \0 b \0 S \0 N \0 P \0
0000420 061 000 063 000 062 000 011 000 061 000 060 000 060 000 060 000
1 \0 3 \0 2 \0 \t \0 1 \0 0 \0 0 \0 0 \0
0000440 147 000 145 000 156 000 157 000 155 000 145 000 163 000 011 000
g \0 e \0 n \0 o \0 m \0 e \0 s \0 \t \0
0000460 141 000 154 000 154 000 145 000 154 000 145 000 040 000 146 000
a \0 l \0 l \0 e \0 l \0 e \0 \0 f \0
0000500 162 000 145 000 161 000 015 000 012
r \0 e \0 q \0 \r \0 \n
0000511
$
$
So that first two characters are those corresponding to octal numbers 377 and 376; that's decimal 255 and 254. Also, there's the character corresponding to number 0 i.e. chr(0) after each character. It is seen as "\0" in the octal dump above.
The newline or End-of-Line (EOL) character should be "\r\n" for Windows and "\n" for Unix/Linux. (Not sure, but I think it's "\r" for Mac OS and "\n" for Mac OSX). None of those EOL characters are present in the text file, which would confuse awk or Perl.
The other file - "file2.txt" appears to have "\r" characters as EOL.
$
$
$ # does "file2.txt" have any "\n" characters?
$
$ cat file2.txt | perl -lne '$count = s/\n//g; print "Number of \\n characters = $count"'
Number of \n characters =
$
$
$ # does "file2.txt" have any "\r" characters?
$
$ cat file2.txt | perl -lne '$count = s/\r//g; print "Number of \\r characters = $count"'
Number of \r characters = 13421
$
$
The "\r" character is the "Carriage Return" character (from the good ol' days of the typewriter); it goes back and starts overwriting the text that was already printed. So it looks like it's "one single line". The octal dump shows the difference clearly.
$
$
$ # what's the first occurrence of "\r" in file2.txt?
$
$ perl -lne 'print index($_, "\r")' file2.txt
162
$
$ # and the second?
$
$ perl -lne 'print index($_, "\r", 163)' file2.txt
264
$
$ # print the first 160 characters of file2.txt
$
$ perl -lne 'print substr($_,0,160)' file2.txt
chr_name chr_start chr_end ref_base alt_base hom_het snp_quality tot_depth alt_depth dbSNP dbSNP131 region geallele fres
$
$ # looks good, but print the first 200 characters now
$
$ perl -lne 'print substr($_,0,200)' file2.txt
chr01ame14930 14930tarA Ghr_end het_base137 65t_base33 som_het snp_quality tot_depth alt_depth dbSNP dbSNP131 region geallele freq
$
$
$ # doesn't look good because "\r" started overwriting the characters printed already
$ # od -bc shows it better; notice the "\r" in bold red below
$
$ perl -lne 'print substr($_,0,200)' file2.txt | od -bc
0000000 143 150 162 137 156 141 155 145 011 143 150 162 137 163 164 141
c h r _ n a m e \t c h r _ s t a
0000020 162 164 011 143 150 162 137 145 156 144 011 162 145 146 137 142
r t \t c h r _ e n d \t r e f _ b
0000040 141 163 145 011 141 154 164 137 142 141 163 145 011 150 157 155
a s e \t a l t _ b a s e \t h o m
0000060 137 150 145 164 011 163 156 160 137 161 165 141 154 151 164 171
_ h e t \t s n p _ q u a l i t y
0000100 011 164 157 164 137 144 145 160 164 150 011 141 154 164 137 144
\t t o t _ d e p t h \t a l t _ d
0000120 145 160 164 150 011 144 142 123 116 120 011 144 142 123 116 120
e p t h \t d b S N P \t d b S N P
0000140 061 063 061 011 162 145 147 151 157 156 011 147 145 156 145 011
1 3 1 \t r e g i o n \t g e n e \t
0000160 143 150 141 156 147 145 011 141 156 156 157 164 141 164 151 157
c h a n g e \t a n n o t a t i o
0000200 156 011 144 142 123 116 120 061 063 062 011 061 060 060 060 147
n \t d b S N P 1 3 2 \t 1 0 0 0 g
0000220 145 156 157 155 145 163 011 141 154 154 145 154 145 040 146 162
e n o m e s \t a l l e l e f r
0000240 145 161 015 143 150 162 060 061 011 061 064 071 063 060 011 061
e q \r c h r 0 1 \t 1 4 9 3 0 \t 1
0000260 064 071 063 060 011 101 011 107 011 150 145 164 011 061 063 067
4 9 3 0 \t A \t G \t h e t \t 1 3 7
0000300 011 066 065 011 063 063 011 163 012
\t 6 5 \t 3 3 \t s \n
0000311
$
$
$
Now if you are working with "file2.txt" in Mac OS, then you'd want to use MacPerl for processing, and I'd assume it takes care of EOL characters. I have no experience with any Mac system though, so don't quote me on that.
On the other hand, if you want to work in RedHat Linux, then you may want to ensure that the EOL characters are "\n" only, before running any of those shell scripts.
You mentioned that those files exist ".xlsx" files i.e. MS Excel 2007 or higher. In that case, saving them as "tab delimited files" should be pretty straightforward.
tyler_durden