Hello,
I am trying to write a bash shell script that does the following:
1.Finds all *.txt files within my directory of interest
2. reads each of the files (25 files) one by one (tab-delimited format and have the same data format)
3. skips the first 10 rows of the file
4. extracts and prints out columns 2,14 , 15 into one output file
5. adds a new column to the final output file with the name of the txt file from where the data was extracted.
I have written a shell script which is not working properly and doesnot have the code for the part to skip 10 rows.
Below I have pasted a sample input file, output file and my code
Input file format:
TYPEtexttexttexttextintegerfloatfloattexttexttextintegerintegerintegerintegerFEPARAMSProtocol_NameProtocol_dateScan_DateScan_ScannerNameScan_NumChannelsScan_MicronsPerPixelXScan_MicronsPerPixelYScan_OriginalGUIDGrid_NameGrid_DateGrid_NumSubGridRowsGrid_NumSubGridColsGrid_NumRowsGrid_NumColsDATAmiRNA-v1_95_May07 (Read Only)####################Agilent Technologies Scanner G2505B US45102930155a18d8bd4-628a-4054-b2ba-45c7a66de583016436_D_20070426############1119282* TYPEfloatfloatfloatintegerintegerfloatintegerfloatfloatfloatintegerfloatfloatintegerSTATSgDarkOffsetAveragegDarkOffsetMediangDarkOffsetStdDevgDarkOffsetNumPtsgSaturationValuegAvgSig2BkgNegCtrlgNumSatFeatgLocalBGInlierNetAvegLocalBGInlierAvegLocalBGInlierSDevgLocalBGInlierNumgGlobalBGInlierAvegGlobalBGInlierSDevgGlobalBGInlierNumDATA26.709275.44777100012031791.11899038.717365.42632.954291202965.42632.9542912029* TYPEintegerintegerintegertextintegertextintegerintegertexttexttexttextfloatfloatFEATURESFeatureNumRowColchr_coordSubTypeMaskSubTypeNameProbeUIDControlTypeProbeNameGeneNameSystematicNameDescriptionPositionXPositionYDATA111 0 01miRNABrightCorner30miRNABrightCorner30miRNABrightCorner30 6774.29228.723DATA212 66Structural21DarkCornerDarkCornerDarkCorner 6800.2229.421DATA313chr14:100595916-1005958970 30A_25_P00010115hsa-miR-154*hsa-miR-154*NA6826.51228.385DATA414chr8:135881995-1358820100 50A_25_P00010390hsa-miR-30bhsa-miR-30bNA6850.48228.853DATA515chr14:100558179-1005581610 70A_25_P00010956hsa-miR-379hsa-miR-379NA6875.37228.408DATA616chr19:058916206-0589161860 80A_25_P00011941hsa-miR-517bhsa-miR-517bNA6900.98229.321
Output format: tab delimited file. The last column shows the filename from which the data was extracted
16774.29228.723ABC.txt26800.2229.421ABC.txt36826.51228.385DEF.txt46850.48228.853DEF.txt56875.37228.408XYZ.txt66900.98229.321XYZ.txt
My incomplete code:
find -name '*.txt' |
while read filename
do
awk -F"\t" -v name="$file"'
BEGIN {OFS="|"}
{print $2,$14,$15,name}
' $filename > output.txt
done
thanks in advance for your help.