You can get the same with many other text filters in UNIX: sed , awk , ... All these methods will be far slower than the variable expansion, though, even if this takes a step in between. It is possible to put that all in one step, but it would be ugly and cumbersome to do so, while this remains readable and understandable.
Thanks Bakunin...
The solution you provided is for one file... So if I hv multiple files, it will be a tedious job...
Can it be done with one command/script and then putting those file names into some other file..
I just need the files names...
for filename in *DATA_*_F* # for filenames like ABC_DATA_BAD5A_RO_F_20161104.CSV
do ftmp=${filename##*DATA_} # gives "BAD5A_RO_F_20161104.CSV"
result=${ftmp%%_F*} # gives "BAD5A_RO"
printf '%s\n' "$result"
done > list.txt
extending bakunin's suggestion to work on all of the files in the current working directory that have the filename pattern you specified and putting the results in a file named list.txt in the same directory.
Of course, all of this assumes that you are using a shell that meets POSIX standard requirements for a shell. In the future when asking questions like this, please tell us what shell and what operating system you're using so we don't have to make so many assumptions.
Thanks Don Cragun.
I tested your code and it ran successfully for the files available in the current directory.
The challenge is My files are at hadoop location and i access those file from my bash prompt using the below command.
hdfs dfs -ls /user/cloudera/prod/SMS
i want the code to run for the files available at this hadoop location (hdfs dfs -ls /user/cloudera/prod/SMS).
Iam trying to figure out the solution for this.
cd /user/cloudera/prod/SMS
for filename in *DATA_*_F*
do ftmp=${filename##*DATA_} # gives "BAD5A_RO_F_20161104.CSV"
result=${ftmp%%_F*} # gives "BAD5A_RO"
printf '%s\n' "$result"
done > list.txt
and if that doesn't work, and assuming that the command:
hdfs dfs -ls /user/cloudera/prod/SMS
gives you a list of filenames separated by sequences of spaces, tabs, and/or newline characters and that none of your filenames contain any space, tab, or newline characters, I would also try:
for filename in $(hdfs dfs -ls /user/cloudera/prod/SMS/*DATA_*_F*)
do ftmp=${filename##*DATA_} # gives "BAD5A_RO_F_20161104.CSV"
result=${ftmp%%_F*} # gives "BAD5A_RO"
printf '%s\n' "$result"
done > list.txt
I have absolutely no experience with hadoop filesystems or utilities, so I have no confidence that either of these will work.