Using -text command and creating multiple files

cplusplus1 · June 2, 2016, 11:39am

Currently using the below script to being all compressed files .gz files from source folder and appending to the target txt file uncompressed.
Teh target txt file is getting too large in size, right now the size of the target txt file is almost 350GB

hadoop fs -text /user/hive/warehouse/stage.db/CLINICAL_EVENT/CLINICAL_EVENT* | hadoop fs -put - /user/hive/warehouse/stage.db/Clinical_event/final/clinical_event.txt

Is there a way to create multiple files at the time of executing the above -text script.

while appending just want to maintain each file of size max 5GB?

as long as the files all are in folder Final, then the hadoop will automatically read.

is there way to create the files like:

clinical_event_1.txt
clinical_event_2.txt
clinical_event_3.txt

so on so forth.

Thanks a lot for the helpful info.

Thank you.

joker · June 2, 2016, 12:49pm

have a look at the split command.

Jewel · June 3, 2016, 7:10am

please try with split -b 5g YOURFILE clinical_event_1txt

joker · June 3, 2016, 8:24am

...and put the split at the end of your pipe, so you do not need to write an intermediate 350 GB file(Will be a lot of faster because it saves you to write+read 350 GB).

hadoop bla ... |  hadoop fs -put - - | split ... outputfile