Experts,
Its been a long never programmed on Shell, thought this might be the opportunity to ask your valuable suggestion on one of the challenges I'm going through, regarding the parsing the string to variable with the usage of "CUT"
#Azure DataLake Path Of the File
DATASET_PATH="adl://xyz123.azuredatalakestore.net/devhdfs/DataWareHouse/sf_inbound/serviceappointment"
#Shell Class
hiveClass () {
hadoop fs -ls ${DATASET_PATH}
}
#Variable that Stores the Complete Path of the File
var=`hiveClass | grep -i "parquet" | cut -d' ' -f15`
Example of Hive Class If executed Explicitly
spark@hn0-emrazs:~$ DATASET_PATH="adl://xyz123.azuredatalakestore.net/devhdfs/DataWareHouse/sf_inbound/serviceappointment"
spark@hn0-emrazs:~$ hiveClass () {
> hadoop fs -ls ${DATASET_PATH}
> }
spark@hn0-xyz1:~$ var=`hiveClass | grep -i "parquet" | cut -d' ' -f15`
spark@hn0-xyz1:~$ echo "$var"
spark@hn0-xyz1:~$ var=`hiveClass | grep -i "parquet" | cut -d' ' -f14`
spark@hn0-xyz1:~$ echo "$var"
spark@hn0-xyz1:~$ var=`hiveClass | grep -i "parquet" | cut -d' ' -f16`
spark@hn0-xyz1:~$ echo "$var"
spark@hn0-xyz1:~$ var=`hiveClass | grep -i "parquet" | cut -d' ' -f13`
spark@hn0-xyz1:~$ echo "$var"
adl://xyz123.azuredatalakestore.net/devhdfs/DataWareHouse/sf_inbound/appointment/part-00000-aa60bb3c-6780-44fa-b93d-4232df81faa1-c000.snappy.parquet
Raw Execution of the Command will have this Result
sparksshuser@hn0-xyz1:~$ hadoop fs -ls adl://xyz123.azuredatalakestore.net/devhdfs/DataWareHouse/sf_inbound/appointment
Found 2 items
-rw-r-----+ 1 sparksshuser sparksshuser 0 2018-10-25 02:08 adl://xyz123.azuredatalakestore.net/devhdfs/DataWareHouse/sf_inbound/appointment/_SUCCESS
-rw-r-----+ 1 sparksshuser sparksshuser 594663 2018-10-25 02:07 adl://xyz123.azuredatalakestore.net/devhdfs/DataWareHouse/sf_inbound/appointment/part-00000-aa60bb3c-6780-44fa-b93d-4232df81faa1-c000.snappy.parquet
Now the Real Challenge is CUT with Field Value. The result will not have constant field value to ensure that I can schedule my script. Every time it changes because of the increase in file size byte.
var=`hiveClass | grep -i "parquet" | cut -d' ' -f13`
Now My question I wanted to cast complete File Url into Variable, so that I can use this as a feeder into Hive table without using "cut -d' ' -f???"
adl://xyz123.azuredatalakestore.net/devhdfs/DataWareHouse/sf_inbound/appointment/part-00000-aa60bb3c-6780-44fa-b93d-4232df81faa1-c000.snappy.parquet