cmccabe
September 20, 2016, 10:33am
1
I am trying to use sed
to remove all lines in a file that are nor vcf.gz
. The sed
below runs but returns all the files with vcf.gz
in them, rather then just the ones that end in only that extention. Thank you :).
file
/output/Home/Auto_user_S5-00580-5-Medexome_66_030/plugin_out/variantCaller_out.40/IonXpress_007/TSVC_variants_IonXpress_007.vcf.gz
/output/Home/Auto_user_S5-00580-5-Medexome_66_030/plugin_out/variantCaller_out.40/IonXpress_007/TSVC_variants_IonXpress_007.vcf.gz.tbi
/output/Home/Auto_user_S5-00580-5-Medexome_66_030/plugin_out/variantCaller_out.40/IonXpress_007/TSVC_variants_IonXpress_007.genome.vcf.gz
/output/Home/Auto_user_S5-00580-5-Medexome_66_030/plugin_out/variantCaller_out.40/IonXpress_007/TSVC_variants_IonXpress_007.genome.vcf.gz.tbi
/output/Home/Auto_user_S5-00580-5-Medexome_66_030/plugin_out/variantCaller_out.40/IonXpress_008/TSVC_variants_IonXpress_008.vcf.gz
/output/Home/Auto_user_S5-00580-5-Medexome_66_030/plugin_out/variantCaller_out.40/IonXpress_008/TSVC_variants_IonXpress_008.vcf.gz.tbi
/output/Home/Auto_user_S5-00580-5-Medexome_66_030/plugin_out/variantCaller_out.40/IonXpress_008/TSVC_variants_IonXpress_008.genome.vcf.gz
/output/Home/Auto_user_S5-00580-5-Medexome_66_030/plugin_out/variantCaller_out.40/IonXpress_008/TSVC_variants_IonXpress_008.genome.vcf.gz.tbi
desired output
/output/Home/Auto_user_S5-00580-5-Medexome_66_030/plugin_out/variantCaller_out.40/IonXpress_007/TSVC_variants_IonXpress_007.vcf.gz
/output/Home/Auto_user_S5-00580-5-Medexome_66_030/plugin_out/variantCaller_out.40/IonXpress_008/TSVC_variants_IonXpress_008.vcf.gz
sed
sed -i '/.vcf.gz/!d' file
RudiC
September 20, 2016, 10:48am
2
sed '/.vcf.gz$/!d' file
/output/Home/Auto_user_S5-00580-5-Medexome_66_030/plugin_out/variantCaller_out.40/IonXpress_007/TSVC_variants_IonXpress_007.vcf.gz
/output/Home/Auto_user_S5-00580-5-Medexome_66_030/plugin_out/variantCaller_out.40/IonXpress_007/TSVC_variants_IonXpress_007.genome.vcf.gz
/output/Home/Auto_user_S5-00580-5-Medexome_66_030/plugin_out/variantCaller_out.40/IonXpress_008/TSVC_variants_IonXpress_008.vcf.gz
/output/Home/Auto_user_S5-00580-5-Medexome_66_030/plugin_out/variantCaller_out.40/IonXpress_008/TSVC_variants_IonXpress_008.genome.vcf.gz
Again your spec is incorrect, here in the desired output derived from your input.
1 Like
Hello cmccabe,
The desired output you have shown doesn't look like it needs only those records which have vcf.gz
at end, if this is the case then 2 more records are left in your shown output line number 3 and 7. If in case you want to get output as I mentioned then you could try following with sed
.
sed -n '/.vcf.gz$/p' Input_file
Also sed -i
option writes output into it's Input_file itself so please beware of using it.
Thanks,
R. Singh
1 Like
rbatte1
September 20, 2016, 11:32am
4
I suppose a sideways thought on this would be "How are you creating the list?"
If it is a find then you could add a bit that says -name "*.cvf.gz"
as in:-
find /output/Home -name "*.cvf.gz"
I hope that this helps, or at least doesn't get in the way.
Robin
1 Like
cmccabe
September 20, 2016, 11:33am
5
I can not seem to remove the .genome.vcf.gz
from the output. Thank you :).
file
/output/Home/Auto_user_S5-00580-5-Medexome_66_030/plugin_out/variantCaller_out.40/IonXpress_007/TSVC_variants_IonXpress_007.vcf.gz
IonXpress_007
/output/Home/Auto_user_S5-00580-5-Medexome_66_030/plugin_out/variantCaller_out.40/IonXpress_007/TSVC_variants_IonXpress_007.vcf.gz.tbi
IonXpress_007
/output/Home/Auto_user_S5-00580-5-Medexome_66_030/plugin_out/variantCaller_out.40/IonXpress_007/TSVC_variants_IonXpress_007.genome.vcf.gz
/output/Home/Auto_user_S5-00580-5-Medexome_66_030/plugin_out/variantCaller_out.40/IonXpress_008/TSVC_variants_IonXpress_008.vcf.gz
IonXpress_007
/output/Home/Auto_user_S5-00580-5-Medexome_66_030/plugin_out/variantCaller_out.40/IonXpress_008/TSVC_variants_IonXpress_008.vcf.gz.tbi
IonXpress_007
/output/Home/Auto_user_S5-00580-5-Medexome_66_030/plugin_out/variantCaller_out.40/IonXpress_008/TSVC_variants_IonXpress_008.genome.vcf.gz
output
/output/Home/Auto_user_S5-00580-5-Medexome_66_030/plugin_out/variantCaller_out.40/IonXpress_007/TSVC_variants_IonXpress_007.vcf.gz
/output/Home/Auto_user_S5-00580-5-Medexome_66_030/plugin_out/variantCaller_out.40/IonXpress_007/TSVC_variants_IonXpress_007.genome.vcf.gz
/output/Home/Auto_user_S5-00580-5-Medexome_66_030/plugin_out/variantCaller_out.40/IonXpress_008/TSVC_variants_IonXpress_008.vcf.gz
/output/Home/Auto_user_S5-00580-5-Medexome_66_030/plugin_out/variantCaller_out.40/IonXpress_008/TSVC_variants_IonXpress_008.genome.vcf.gz
desired output
/output/Home/Auto_user_S5-00580-5-Medexome_66_030/plugin_out/variantCaller_out.40/IonXpress_007/TSVC_variants_IonXpress_007.vcf.gz
/output/Home/Auto_user_S5-00580-5-Medexome_66_030/plugin_out/variantCaller_out.40/IonXpress_008/TSVC_variants_IonXpress_008.vcf.gz
Hello cmccabe,
If there is always a file which is ending with any digit and then have .vcf.gz
eg--> _007.vcf.gz
or _008.vcf.gz
.
Then following may help in same.
sed -n '/[0-9].vcf.gz$/p' Input_file
OR
awk '($0 ~ /[0-9].vcf.gz$/)' Input_file
Thanks,
R. Singh
1 Like
RudiC
September 20, 2016, 12:19pm
7
Please read your post#1 carefully. WHERE did you specify THAT?
Everyone who answered ran in a false direction first!
1 Like
RudiC
September 20, 2016, 12:22pm
8
sed '/.vcf.gz$/!d;/genome/d' file
/output/Home/Auto_user_S5-00580-5-Medexome_66_030/plugin_out/variantCaller_out.40/IonXpress_007/TSVC_variants_IonXpress_007.vcf.gz
/output/Home/Auto_user_S5-00580-5-Medexome_66_030/plugin_out/variantCaller_out.40/IonXpress_008/TSVC_variants_IonXpress_008.vcf.gz
1 Like
bakunin
September 20, 2016, 12:24pm
9
Yes, either that or, if you want to filter out specifically the files *genome.vcf.gz
, then:
sed '/vcf.gz$/!d;/genome.vcf.gz$/d' /path/to/input
I hope this helps.
bakunin
1 Like
cmccabe
September 20, 2016, 1:13pm
10
Thank you all :). I apologize for not being more clear in my post and will ensure that I am in the future, thanks again.