Read .xlsx and text files in a loop using openpyxl

I have a list of .xlsx files (names 1.xlsx, 2.xlsx etc) in a directory, on which I need to insert data from its corresponding text file (named 1.txt, 2.txt etc) on the second worksheet named 'Filtered' and save it. The code I am trying is

#!/usr/bin/python

import os
from openpyxl.reader.excel import load_workbook
from openpyxl.drawing.image import Image
import csv

directoryPath = r'/REPORTS/'
os.chdir(directoryPath)
folder_list = os.listdir(directoryPath)
for folders, sub_folders, file in os.walk(directoryPath):  
    for name in file:
        if name.endswith(".xlsx"):
            filename = os.path.join(folders, name)
            wb = load_workbook(filename, data_only=True)
            ws = wb.get_sheet_by_name('Filtered')

##here is where the text files are stored
            directory = os.path.join("/REPORTS/temp", "path")
            for root,dirs,files in os.walk(directory):
               for file in files:
                    if file.endswith(".txt"):
                        f = open(textfile, 'r')
                        reader = csv.reader(f, delimiter='\t')
                        for row in reader:
                           ws.append(row)
                           wb.save(filename)

Although the code goes through all excel files, it is not going through the text files to write its corresponding result onto the 2nd worksheet of each excel file. Any suggestions to edit the code ?

You are not opening the correct text file. That's the reason your script doesn't work.
Keep in mind that given the Excel file "1.xlsx", you want to open "1.txt".
Here, "1" is the keyword.
As long as the files "N.xlsx" and "N.txt" (where N = 1, 2, 3, ... etc.) are linked via "N", you can exploit this feature in your code as follows:

import os
from openpyxl.reader.excel import load_workbook
from openpyxl.drawing.image import Image
import csv

xl_directory = r'<path_to_Excel_directory>'
txt_directory = r'<path_to_text_directory>'

for xl_root, xl_dirs, xl_files in os.walk(xl_directory):
    for xl_file in xl_files:
        if xl_file.endswith('.xlsx'):
            xl_abs_file = os.path.join(xl_root, xl_file)
            wb = load_workbook(xl_abs_file, data_only = True)
            ws = wb.get_sheet_by_name('Filtered')
  
            # Now search for the corresponding text file.
            # Note: if xl_file = '1.xlsx', then xl_file.replace('xlsx', 'txt')
            # returns '1.txt' - the corresponding text file name. If this doesn't
            # hold true, then a different algorithm will be required.
            for txt_root, txt_dirs, txt_files in os.walk(txt_directory):
                for txt_file in txt_files:
                    if txt_file == xl_file.replace('xlsx', 'txt'):
                        fh = open(txt_file, 'r')
                        reader = csv.reader(fh, delimiter = '\t')
                        for row in reader:
                            ws.append(row)
                            wb.save(xl_abs_file)
2 Likes

Thank you so much. I did try the code edits but it still does not write the text output in the .xlsx worksheet desired.

Also, I get a warning msg (which I think can be ignored?) My excel and text files are named similarly with its respective extensions

/usr/local/lib/python2.7/site-packages/openpyxl/reader/worksheet.py:310: UserWarning: Unknown extension is not supported and will be removed
  warn(msg)

If you do a Google search for the string: "openpyxl UserWarning: Unknown extension is not supported and will be removed", you will see this issue being reported by a lot of users.
In the first search result from StackOverflow, a very good explanation is given about this warning. The extension that they talk about is not a file extension, but "features" added to the Excel specification that openpyxl does not handle.

Keep in mind that openpyxl is an open-source module and is still under development.
And it needs to keep pace with newer extensions in newer versions of Excel specification.

The more concerning fact is that Python did not write anything to your Excel file.
(a) Could you post all versions that you are using?
(b) Also, could you add print statements to track the execution and determine where exactly it fails?
Does it find the Excel files and text files and csv-read the text files and then fail at wb.append() or wb.save()? Does it fail before that?

I tried the posted code in two different laptops with following software versions:

Laptop 1:
Pythonxy installation on Windows 7 64-bit (http://python-xy.github.io/)

1)  Python 2.7.10
2)  openpyxl - don't know the version
3)  Excel - Microsoft Excel 2013

Laptop 2:
Anaconda installation on Debian 8 64-bit (https://docs.continuum.io/anaconda/install-linux.html)

1)  Python 3.6.0 on Anaconda 4.3.1 (64-bit)
2)  openpyxl version 2.4.1
3)  Libreoffice Calc 4.3.3.2

In both cases, I created fresh new text and Excel/Libreoffice files with dummy data in text files and empty worksheets named "Filtered".
The Libreoffice file was saved as "Microsoft Excel 2007/2010/2013 XML (.xlsx)"

I did not see any warning in either case.
The data was written successfully to the "Filtered" worksheet in both cases.

---------- Post updated at 09:58 AM ---------- Previous update was at 09:42 AM ----------

Just remembered one more thing.
Ensure that all text and Excel files are closed when you run your Python script.
In Laptop 1, I remember seeing an error message since Microsoft Excel locks all open files and hence Python cannot write to it.
(This file-locking is a very well-known fact which you probably know, but I thought of mentioning it in any case.)

Hello,
Sorry for the delayed response. yes, all files are closed before running the commands.

These are the versions I am using

1) Python 2.7.6
[GCC 4.8.2] on linux/Ubuntu 3.13.0-24-generic 64 bit
2) openpyxl 2.5.0a1
3) Libre office version 4.2.4.2

*Update* Now when I run the code, it works! I replaced the following

 
fh = open(txt_file, 'r')

to

 
with open (os.path.join(txt_root, txt_file)) as fh:

Just another quick question. The original excel files have some images/fonts/styling but using openpyxl and saving the *new* excel file make all the images and styling go away. Is there a way around to keep the original excel file intact ?

Many thanks for your support.

I am unable to replicate your issue.
Can you post a minimal Excel file, minus all personal/sensitive information, on which you face this issue?

I have uploaded the 1)xlsx file 2) text file 3)script. So the images are only the first two worksheets. After the script has run, the images disappear.

Thank you

Ok, I tested your script with the Excel and text file and the image does disappear.
Looks like openpyxl does not have the capability to retain the images in an Excel file that is opened for reading.
The Internet is littered with rants about this issue. See, for example, here, here, here, and here.

1 Like

Yes, that's what I thought as well. I guess the only option for now would be to insert the image while saving the new excel file.
Thank you for all the support you have provided so far, appreciate it!