Pandas vs Python CSV module

I was wondering is there any occasion where csv module will be desired over pandas. I skipped learning csv module and jumped right into the beautiful pandas and its magical ability to manipulate data. Dataframes are beautiful!

Moved to "Programming", since the question wasn't about shell script.

Your question amounts to the one which always has to be asked about Perl and Python.

How many rounds of installation whack-a-mole will you put yourself and others through, when you take this script anywhere else?

There's always beautiful modules which simplify your programs to one-liners, but who has them? Having to install modules to use it effectively means it's only useful to a system administrator.

1 Like

If you are on windows open the resource monitor (hit windows +r then type "resmon").
If you are running MacOS or Linux there are similar tools.

Run the program and check the number of hard faults and the amount of physical memory used. You can checkmark the python process to get the numbers for only that process. If there are a lot of hard faults and the amount of physical memory is high you don't have enough RAM.

You could also do some testing with smaller .csv files. If a file with only x% the entries takes x% of the time then RAM is not the problem. If for small x (where less memory is needed and the whole thing might fit in your RAM) it takes less than x% of the time then RAM is the problem.

I don't know about the library you are using but if possible log the time every 10000 (or s0) lines parsed with the time. This will allow you to monitor if the rat at which the file is parsed stays constant or slows down.