解决pandas读取上传的excel报错BadZipFile(“File is not a zip file“) zipfile.BadZipFile: File is not a zip file

业务需求是从页面上下载一个表格模板，编辑后，再上传到后台进行解析，将表格内的内容存储在数据库。但是在后台解析的时候出现报错。

刚开始使用的是

import pandas as pd
df1 = pd.read_excel(
                filename,
                engine='openpyxl',
                index_col=0
            )

这种情况一直报错显示BadZipFile(“File is no t a zip file“) zipfile.BadZipFile: File is no t a zip file

追踪源码发现后台是判断了读取 excel的引擎，没有传入的情况下，默认使用的是xlrd

"""
_engines = {
        "xlrd": _XlrdReader,
        "openpyxl": _OpenpyxlReader,
        "odf": _ODFReader,
        "pyxlsb": _PyxlsbReader,
    }
""" 

def __init__(self, path_or_buffer, engine=None):
    if engine is None:
        engine = "xlrd"
        if isinstance(path_or_buffer, (BufferedIOBase, RawIOBase)):
            if _is_ods_stream(path_or_buffer):
                engine = "odf"
        else:
            ext = os.path.splitext(str(path_or_buffer))[-1]
            if ext == ".ods":
                engine = "odf"
    if engine not in self._engines:
        raise ValueError(f"Unknown engine: {engine}")

所以我就想不传入引擎，pip 安装 xlrd，使用默认的，继续报错

xlrd.bi ff h.XLRDError: Unsupported format, or corrup t file: Ex pe cte d BOF record; found b‘<html x…

原因就是这个文件虽然是xls结尾，但是内容并不是xls格式的，里面内容嵌套了很多的其他格式，然后使用 pandas的read_html 方法，因为我上传的exc el 就是 html 转换来的。继续测试，又提示缺lxml包，继续安装，安装完毕，记得重启 pycharm。

df1 = pd.read_html(
                filename
            )

再次测试发现可以成功读取到上传的excel的数据了。

原文地址:https://blog.csdn.net/weixin_42008966/ar t icle/details/128095320

本文来自互联网用户投稿，该文观点仅代表作者本人，不代表本站立场。本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。

如若转载，请注明出处：http://www.7code.cn/show_45788.html

如若内容造成侵权/违法违规/事实不符，请联系代码007邮箱：suwngjj01@126.com进行投诉反馈，一经查实，立即删除！

声明：本站所有文章，如无特殊说明或标注，均为本站原创发布。任何个人或组织，在未征得本站同意时，禁止复制、盗用、采集、发布本站内容到任何网站、书籍等各类媒体平台。如若本站内容侵犯了原著者的合法权益，可联系我们进行处理。

相关文章

发表回复 取消回复

发表回复取消回复