Now as per requirement , each row should have only 3 delimiter.
Now the 2nd row & last row has an extra delimiter, How to remove that ? In some large file having 100K data , there can be 100 such rows having extra pipe , how to remove it ?
Please, as we do not know your environment as you do, tell us about it:
your shell?
your OS?
the version of your OS?
Furthermore: i guess that your file can contain only ONE extra delimiter per line and the delimiter will contain no data, like:
a|b|c|d|e|||
Otherwise, you will have to explain what to do with such cases.
If this is so and you have a run-of-the-mill UNIX system you can try the following:
sed 's/^\(\([^|]*|\)\{3\}[^|]*\).*/\1/' /path/to/your/file
This will display the changed file only to screen. If you are satisfied with the outcome use:
sed 's/^\(\([^|]*|\)\{3\}[^|]*\).*/\1/' /path/to/your/file > /path/to/newfile
to save these results.
Explanation of the regexp:
[^|]*| matches a single cell, an arbitrary number of non-delimiters followed by a delimiter. This regexp is repeated three times: \([^|]*|\)\{3\} , then followed by an optional field content of non-delimiters: \([^|]*|\)\{3\}[^|]* .
All this is surrounded by brackets to use it as a back-reference. Any further content of the line is then included only to replace everything by the back-reference above so that effectively the rest of the line is deleted.
Thank you very much for your reply. I will check the env. details and will post.
linux/bash
regarding the scenario, I should have clearified it in first instance.
Use case is - Each row only have 4 Pipe delimiter not more than that. If there are 2/3/4 extra delimiter, it will need to be removed. Last field can/can't contain data as that is a nullable field.
I modified it to look at the first line, and use that as a model -- if that line is correct, then the following lines will be modified to conform to that.