Thanks Jim and rbatte1 for your suggestions.
I could not reply earlier since I was weighing your options and checking what do we have available in our infrastructure.
Jim's idea of NFS/NAS is good but I am still not sure if an NFS mount of a storage onto the local Informatica Linux server and the same storage also mounted onto an FTPS server is good option or a remote storage device mounted onto an FTPS server is a better option or if there is any other alternative.
However since Jim and rbatte1 still have questions about the setup on my end and the requirements, let me provide you more information that I gathered.
We are using Linux Red-Hat 5.10 for the Informatica server which is creating the files through ETL workflows and the new servers we could get for FTPS or storage would be a Redhat 6.5 and above. NFS that is used in our firm is NFS4.
We have two datacenters about 20 miles apart.
Our Informatica Server has Local Veritas Cluster File System. I believe its SAN storage.
If a NAS filer were to be mounted on our Informatica Linux server in any mount point, it would be mounted via NFS4 and probably have storage device such as Hitachi and the normal practice is to have the NAS filer / storage device located in the same datacenter as the Application server on which it will be mounted.
Just for you to get an idea on the network speed, on a totally different unrelated LInux Redhat 6.5 server which already had a NAS mount for me to test, I just tested by copying a 3.4GB zip file from local SAN storage on the redhat 6.5 server to an NAS Filer that had been mounted via NFS4 onto that server (both in the same datacenter) and it took in the range of 30-45 Seconds
Regarding Jim's other point that using FTPS tools like rsync/lftp would utilize network resouces, I just wanted to let you know that if we were to request an FTPS server , then both the FTPS server and its associated storage would be located in the same Datacenter as our applciation Server/Informatica server which generates the data files. Under this scenario do you think FTPS causes network overhead ?
Regarding Rbatte1's concern about handling incomplete files, the Copy to NFS mount or Remote Copy or FTPS to FTPS server would happen only after the file is fully generated by the ETL processes . I also did a few tests of trying to FTP/RSYNC large files and what I noticed was that unlike normal unix 'cp' command, LFTP/RSYNC FTPS commands donot make a file available on the target FTPS Server until every byte of the file has been fully transmitted by the client to the target server. So I think a file being partially available on the target server is not an issue if we were using FTPS to push the files.
However if we were to use a NAS filer instead of an FTPS server then if we were to use 'cp' command instead of lftp/rsync FTPS commands, then it might cause the file to be partially available as the file is progressively being copied to the target storage, because of which we might have to first use 'cp' to copy the file to a dummy name on the NFS mount and then run 'mv' command to move the dummy file to the final filename.
Here is some additional information in response to the other doubts about our requirement posed by rbatte1:
(1) Where is the datasource: The data is actually computed on the fly by the ETL workflows on the informatica server which read data they receive from Vendors, do lots of matching and trasnformation to generate the final data which they then write to text files on the local file system of the same informatica server presently. But going forward we donot want the ETL workflows / processes to write this data to the local file system but to files to a remote storage which could be mounted on an FTPS server all of which would mostly be in the same datacenter or worst case 20 miles apart in a different datacenter.
(2) Where are the clients: The clients / consumers are within the firm and so at most their client hosts might be in the same Data center as the FTPS server / storage device on which the files are located or worst case the client hosts might be located in another datacenter about 20 miles away. So Client latency is not an issue since for them it would just be a matter of changing the FTPS server name . My only concern is the latency upto the point when we place the data in the FTPS server
(3) How much data are we talking: The ETL worflows generate data at several times during the day. The files they generate range insize from few kilobytes to as high as 5GB . But on an average they might generate and have to generate and save about 70GB of data in files daily.
(4) How often will it be written: There are several jobs that are scheduled to run at different times of the day. So there might be a job creating and writing files every 15 minutes. Process does not update existing files. Everytime it runs it generates new files. Previous day's files are purged from the system through Archival processes. We donot care about previous days files. ETL processes always generate new files from scratch since the financial data they contain has to be current.
(5) How often will it be read: They are over 100 consumers. They read these files at different times of the day. We donot know when different consumers and read the files we generate
(6) Will a file be ignored if it has not bee updated: Consumers pick whatever files are present on the FTPS server . Our ETL workflows / processes make sure that a file is made availbe on the target location (which currently is on local file system) only after the full file has been updated.
(7) What is the network like : Since all the parties involved in the application are internal to the firm the network communication is over LAN. I donot have more specifics on the network speed but normal corporate speed network.
Appreciate if you could review all the additional clarifications I have provided above and based on that if you could provide your recommendations for the optimal solution to make these files available from our local Informatica Linux server to a remote storage device from which consumers can connect via FTPS and pull the files, that would be very much appreciated.
thanks