I have 133 .txt files in a directory that I am combining into 1 file. The problem is when I use awk or cat to combine the files I get out put like this:
I know the input and output do not match, but the format of the input is always the same, but it seems the spacing is off when I combine the files. If I do a copy and paste (copy file 1 then 2 and paste them into a text file) I get the desired output.
example input
name 31 Index Chromosomal Position Gene Inheritance
122 2106725 TSC2 AD
124 2115481 TSC2 AD
121 2105400 TSC2 AD
82 135782221 TSC1 AD
81 135782026 TSC1 AD
126 2138218 TSC2 AD
123 2113107 TSC2 AD
125 2126142 TSC2 AD
name2 12 Index Chromosomal Position Gene Inheritance
1 43396568 SLC2A1 AD, AR
name3 20 Index Chromosomal Position Gene Inheritance
188 2135240 TSC1 AD
179 2103379 TSC1 AD
191 2137899 TSC2 AD
181 2110617 TSC2 AD
190 2137857 TSC2 AD
189 2137806 TSC2 AD
186 2133798 TSC2 AD
187 2135074 TSC2 AD
180 2105400 TSC2 AD
183 2122822 TSC2 AD
192 2138218 TSC2 AD
185 2125937 TSC2 AD
184 2125788 TSC2 AD
193 2138269 TSC2 AD
182 2112981 TSC2 AD
desired output
name 31 Index Chromosomal Position Gene Inheritance
82 135782221 TSC1 AD
81 135782026 TSC1 AD
name3 20 Index Chromosomal Position Gene Inheritance
188 2135240 TSC1 AD
179 2103379 TSC1 AD
191 2137899 TSC1 AD
Not the slightest idea what your problem is. Where does the combined output come into play? I can't find any of those lines in your desired output. What does "spacing is off" mean?
I hope this helps but I think the problem is when I combine two files there are many new lines that the new file contains that are not there when I do a copy and paste. Thank you :).
awk 'FNR==1{print ""}{print}' *.txt > example.txt
desired output (no spaces between new lines)
name1 1 Index Chromosomal Position Gene Inheritance
176 40757228 ADSL AR
51 1.26E+08 ALDH7A1 AR
49 1.26E+08 ALDH7A1 AR
52 1.26E+08 ALDH7A1 AR
50 1.26E+08 ALDH7A1 AR
178 62857727 ARHGEF9 AD, AR
13 1.6E+08 ATP1A2 AD
name2 2 Index Chromosomal Position Gene Inheritance
102 52200340 SCN8A AD
134 61991153 CHRNA4 AD
136 62038585 KCNQ2 AD
name3 3 Index Chromosomal Position Gene Inheritance
122 2106725 TSC2 AD
124 2115481 TSC2 AD
121 2105400 TSC2 AD
name4 4 Index Chromosomal Position Gene Inheritance
4 43394661 SLC2A1 AD, AR
22 1.67E+08 SCN1A AD
name5 5 Index Chromosomal Position Gene Inheritance
75 52319081 EFHC1 AD, AR
51 1.67E+08 SCN9A AD
103 1.31E+08 SPTAN1 AD
84 1.47E+08 CNTNAP2 AD
134 6640393 TPP1 AR
for f in /home/cmccabe/Desktop/folder/*.txt ; do
bname=`basename $f`
pref=${bname%%.txt}
sed 's/\r//' | | sed -E 's,\\t|\\r|\\n,,g' $f > /home/cmccabe/Desktop/new/${pref}_unix.txt
done
As a rule I do not download any attachment from any forum. If it is true that you have lines containing tabs and spaces, and that the character return is present and you would like to remove those, please, try the following:
I apologize for the long post, I am trying to avoid attachments. If I merge file1,2,3 into one example.txt it appears that things do not copy over correctly. Then the awk does not result in the desired output. If I manually copy and paste the output is fine, but I have too many files to do that. cat doesn't seem to work either and I'm not sure what else to try. Thank you :).
file1.txt
name 31 Index Chromosomal Position Gene Inheritance
122 2106725 TSC2 AD
124 2115481 TSC2 AD
121 2105400 TSC2 AD
82 135782221 TSC1 AD
81 135782026 TSC1 AD
126 2138218 TSC2 AD
123 2113107 TSC2 AD
125 2126142 TSC2 AD
file2.txt
name2 12 Index Chromosomal Position Gene Inheritance
1 43396568 SLC2A1 AD, AR
file3.txt
name3 20 Index Chromosomal Position Gene Inheritance
188 2135240 TSC1 AD
179 2103379 TSC1 AD
191 2137899 TSC2 AD
181 2110617 TSC2 AD
190 2137857 TSC2 AD
189 2137806 TSC2 AD
186 2133798 TSC2 AD
187 2135074 TSC2 AD
180 2105400 TSC2 AD
183 2122822 TSC2 AD
192 2138218 TSC2 AD
185 2125937 TSC2 AD
184 2125788 TSC2 AD
193 2138269 TSC2 AD
182 2112981 TSC2 AD
name 31 Index Chromosomal Position Gene Inheritance
122 2106725 TSC2 AD
124 2115481 TSC2 AD
121 2105400 TSC2 AD
82 135782221 TSC1 AD
81 135782026 TSC1 AD
126 2138218 TSC2 AD
123 2113107 TSC2 AD
125 2126142 TSC2 ADname2 12 Index Chromosomal Position Gene Inheritance
1 43396568 SLC2A1 AD, ARname3 20 Index Chromosomal Position Gene Inheritance
188 2135240 TSC1 AD
179 2103379 TSC1 AD
191 2137899 TSC2 AD
181 2110617 TSC2 AD
190 2137857 TSC2 AD
189 2137806 TSC2 AD
186 2133798 TSC2 AD
187 2135074 TSC2 AD
180 2105400 TSC2 AD
183 2122822 TSC2 AD
192 2138218 TSC2 AD
185 2125937 TSC2 AD
184 2125788 TSC2 AD
193 2138269 TSC2 AD
182 2112981 TSC2 AD
awk command to match specific name and copy header row where the match was found:
name 31 Index Chromosomal Position Gene Inheritance
82 135782221 TSC1 AD
81 135782026 TSC1 AD
188 2135240 TSC1 AD
179 2103379 TSC1 AD
Desired output
name 31 Index Chromosomal Position Gene Inheritance
82 135782221 TSC1 AD
81 135782026 TSC1 AD
name3 20 Index Chromosomal Position Gene Inheritance
188 2135240 TSC1 AD
179 2103379 TSC1 AD