Comma inside CSV file and PERL split

Need to print the " " in a single filed in split function.

field1,field2,field3:

foo, "bar, baz", baz

output expected :

foo
"bar,baz"
baz

syntax

my @elem = split /\,/;

Output Tried

foo
"bar
baz"
baz

Isn't this what you would expect?

You split a string based on a comma, and you got exactly what is expected, right @nmkl2021

foo
"bar
baz"
baz

What else would you expect when you split your example string based on a comma?

???

What is your next step? Are you going to recombine the "parts" (results from your split) as you want? Do you want to try another method to split using a different field separator?

It's good to actually think about how to process your string, step-by-step, don't you agree?

1 Like

i am reading the value in array. excel -> csv with comma.

Currently the inputs are below

21W10,,10000,,"Abrien, geoogd",300,gram,79,stuks,233.04999999999987,0.0,n,Bakproducten,Bloem & meel,"Sulfiet, ","Noten, Pinda, Gluten, ",Turkije,Non-EU,TRUE,Nee
21W10,61193,10001,,"Adelen, onrand",320,gram,76,stuks,300.19999999999965,0.0,n,"Noten, zaden, gedroogd fruit & SF",Noten,"Noten, ","Pinda, Gluten, ",USA,Non-EU,TRUE,Nee

code

 chomp;
        my @elem = split /\,/;
           if ( $#elem !=19 ) 

current output

21W10
10000
Abrien
geoogd

Currently, i would like to vomit the comma inside the double quote ( " , ") and the expected output in the array would be below.

In the excel the filed quoted " " double quote is a single field. so i need to read the " " into a single field and consider a single value.
Expected output by vommiting comma inside double quoted.

21W10
10000
Abrien geoogd

In the excel the filed quoted " " double quote is a single column. so i need to read the " " into a single field and consider a single value.

Yes, I understand completely. This is a very common issue parsing CSV files.

"C" in CSV stands for "comma", so you cannot simply split based on a comma unless you plan to loop though your parts and combine them.

Or you can just do like most of us (lazy system admins) and use a CSV parser which takes care of these kinds of common CSV parsing issues. This issue is nearly as "old as the hills". Why reinvent the wheel writing code to solve issues (like commas in the values of a CSV file) addressed a long time ago?

Why not add and test a CSC parser?

PERL has mature CSV parsers, for example:

2 Likes

Hi @nmkl2021,

the input is badly formatted. Either all fields should be quoted or, even better, a character should be selected as the field separator that does not appear in the fields themselves, e.g. TAB, # or |. Where does the input come from resp. do you have the possibility to control its format?

1 Like

Yes true.I dont have the control to change the format @bendingrodriguez .

True @Neo but i am receving the input files as comma demilited file.

Of course you don't. The issue is not the formatting of the CSV file. That's a given.

The issue is how to parse a CSV file with commons in the values, and as I said, this issue has been addressed for "ages" in IT. This is a common issue.

Discussing "why is it formatted this way" @bendingrodriguez is off topic , to be honest. The file is "what it is".... and the issue is to guide the OP to process what he has at hand.

@nmkl2021 ... you can use a CSV parser for PERL which already solves this problem, as I mentioned. Just try one, like the one I suggested, and get it to work.

Or you can "reinvent the wheel" and write you own CSV parser.

:slight_smile:

Yes, I know.

Just use a CSV parser for PERL which can handle / process embedded commas.

This is the third time I have mentioned that to you.,

What is not clear about my suggestion.? I provided a link to a CSV parser for PERL

Did you try it as I suggested?

Here it is again:

Or you can google and find other CSV parsers for PERL that can parse a CSV file like yours with embedded commas in the values.

This problem is "as old as the hills" as the saying goes.

Note: Did you google:

parse CSV file with embedded commas

??

You can also write a REGEX to do this (or google one). As I mentioned it is common to have such issues in CSV files.

As a side note, I import a lot of CSV data into a DB using Ruby (it's a Rails project) and sometimes the CSV data provided to me has this same issue (commas embedded in the data).

It looks like the fields within the quotes are separated by , (note the space), so you could replace that with another character like #. Then you can split on the comma and re-replace # by , . Or use a CSV parser, like @Neo suggested.

BTW, I still consider my question to be appropriate. How do we know a priori of course that the OP has no control over the input? Or did I miss something? Anyway, not important :slight_smile:

Here is an example (in Ruby) of how to parse this using a CSV parser, using @nmkl2021 original data set and providing the results he desires:

$ cat csv_parse.rb
foo, "bar, baz", baz
cat nmkl2021.rb
#!/Users/Tim/.rvm/rubies/ruby-2.7.0/bin/ruby
require 'csv'
input_string = File.read("./csv_parse.rb").force_encoding("ISO-8859-1").encode("UTF-8")
data = CSV.parse(input_string, :col_sep => ", ") 
puts data
$ ./nmkl2021.rb
foo
bar, baz
baz

Note, it can be simplified to:

#!/Users/Tim/.rvm/rubies/ruby-2.7.0/bin/ruby
require 'csv'
input_string = File.read("./csv_parse.rb")
data = CSV.parse(input_string, :col_sep => ", ") 
puts data

and still works as @nmkl2021 desires:

$ ./nmkl2021.rb
foo
bar, baz
baz

This is one of many good reasons to use CSV parsers when parsing CSV files.

As mentioned, this is a "trivial problem" which has been addresses by countless people working with CSV files where the data is "not clean" and "has embedded commas" in the data.

@nmkl2021, I don't program in PERL these days (was a huge PERL fan in the 1980s and 1990s). Now I prefer an object-oriented scripted language, in this case Ruby, for these kinds of simple tasks. I like OO scripting.

If you want to use PERL, you can find a parser for PERL which has similar methods / flags to deal with these cases (commas in the data). Or you can use my Ruby code I put together for you; or choose another route / path / direction.

HTH

No worries.

The OP confirmed he has no control over the data.

The reason I assume this (correctly) is because I work in IT doing this kind of work for many decades in practice, and my experience is that I am given CSV data to parse (by the customer or client); and rarely do I create CSV data only to parse it. If that was the case, I would, generally speaking as a CSV "bored to death parser" skip the step of writing the data as CSV in the first place.

In other words, I generally do not do this:

  1. Create a CSV file from "the data"
  2. Parse the CSV file.
  3. Process the parsed CSV data

I generally, as a developer, just

  1. Process "the data".

My instincts are pretty good about these things since I would not, as mentioned, create a CSV file only to decode the CSV and then process it. This is especially true when we see CSV files with embedded commas. This generally indicates that someone has dumped some data as CSV and passed it along to someone else to further process it (like the customer to a developer).

Frankly speaking, people have dumped (handed me) so many CSV files over the years to upload to a DB, that I find this the "low light" of my IT life, LOL. But it goes with the territory, as the saying goes; and often customers / clients have little choice but to dump to CSV and pass that data along to me expecting it to be cleanly uploaded to an SQL database.

Hope this helps (explain why I assumed the data was "handed over" to @nmkl2021 to process).

@Neo,

you have been working in the IT for decades and of course you know exactly how to handle CSVs. So for sure, you would not create a CSV file only to decode the CSV and then process it. But I'd not be sure if that's the case with the OP too.

1 Like

Point well taken.

I tried to parse the csv and could see the module is not installed in my machie.

csv

Tudor,Vidor,10,Hapci
Szundi,Morgo,7,Szende
Kuka,"Hofeherke, alma",100,Kiralyno
Boszorkany,Herceg,9,Meselo

code

#!/usr/bin/perl
use strict;
use warnings;
 
use Text::CSV;
my $csv = Text::CSV->new({ sep_char => ',' });
 
my $file = $ARGV[0] or die "Need to get CSV file on the command line\n";
 
my $sum = 0;
open(my $data, '<', $file) or die "Could not open '$file' $!\n";
while (my $line = <$data>) {
  chomp $line;
 
  if ($csv->parse($line)) {
 
      my @fields = $csv->fields();
      $sum += $fields[2];
 
  } else {
      warn "Line could not be parsed: $line\n";
  }
}
print "$sum\n";

Yes, you normally need to install PERL modules like this. The reason people generally use PERL is because they can install PERL modules as needed :slight_smile: Did you install it?

Maybe you should use Ruby, since there is no need to install the core csv libs, as it comes with the distro?

Note: If you cannot install PERL modules, then there is little reason to use PERL, since one of the core design themes around PERL is to install modules and add capability as required. On the other hand, this is also true of Python, Ruby, Javascript, C, C++, PHP and most programming languages.

I am in beginer level in perl. any other method in perl to vommit comma inside the double qouted string.

Learn to install PERL modules. That is a basic beginners task.

Or learn to use another programming language.

Is this homework?

You have been given the normal solution; but you seem to want to use a different method, I am guessing because this is an assignment from your school.

If this was not homework, you could easily solve this problem by the method (s) I provided (Use CSV parsing methods).

So, I'm guessing this is homework and you must use PERL and you cannot use a CSV parser.

Correct?

Yes it is a assigned work.

I got the below from the artice.

my $string = "Word1, Word2, \"Commas, inbetween\", Word3, \"Word4Quoted\", \"Again, commas, inbetween\"";
my @arglist = $string =~ m/("[^"]+"|[^,]+)(?:,\s*)?/g;
map { print $_ , "\n"} @arglist;