V2EX = way to explore
V2EX 是一个关于分享和探索的地方
现在注册
已注册用户请  登录
推荐学习书目
Learn Python the Hard Way
Python Sites
PyPI - Python Package Index
http://diveintopython.org/toc/index.html
Pocoo
值得关注的项目
PyPy
Celery
Jinja2
Read the Docs
gevent
pyenv
virtualenv
Stackless Python
Beautiful Soup
结巴中文分词
Green Unicorn
Sentry
Shovel
Pyflakes
pytest
Python 编程
pep8 Checker
Styles
PEP 8
Google Python Style Guide
Code Style from The Hitchhiker's Guide
0littleboy
V2EX  ›  Python

Python 读取 json 数据的问题

  •  
  •   0littleboy · 2023-02-23 14:26:24 +08:00 · 1700 次点击
    这是一个创建于 673 天前的主题,其中的信息可能已经有所发展或是发生改变。

    最近在使用 Pandas 处理 json 数据时遇到了 ValueError: Protocol not known 的问题

    后面使用 json 库就解决了,不明白为什么

    json 的数据就是 data 里包个 contestUpcomingContests ,里面再包一个数组,内有两个元素

    import json
    import pandas as pd
    
    data = '{"data":{"contestUpcomingContests":[{"containsPremium":false,"title":"\u7b2c 99 \u573a\u53cc\u5468\u8d5b","cardImg":"https://assets.leetcode.cn/aliyun-lc-upload/contest-config/biweekly-contest-99/contest_detail/pc_card.png","titleSlug":"biweekly-contest-99","startTime":1677940200,"duration":5400,"originStartTime":1677940200},{"containsPremium":false,"title":"\u7b2c 334 \u573a\u5468\u8d5b","cardImg":"https://assets.leetcode.cn/aliyun-lc-upload/contest-config/weekly-contest-334/contest_detail/pc_card.png","titleSlug":"weekly-contest-334","startTime":1677378600,"duration":5400,"originStartTime":1677378600}]}}'
    
    df1 = json.loads(data)
    print(df1)
    df2 = pd.read_json(data)
    print(df2)
    
    {'data': {'contestUpcomingContests': [{'containsPremium': False, 'title': '第 99 场双周赛', 'cardImg': 'https://assets.leetcode.cn/aliyun-lc-upload/contest-config/biweekly-contest-99/contest_detail/pc_card.png', 'titleSlug': 'biweekly-contest-99', 'startTime': 1677940200, 'duration': 5400, 'originStartTime': 1677940200}, {'containsPremium': False, 'title': '第 334 场周赛', 'cardImg': 'https://assets.leetcode.cn/aliyun-lc-upload/contest-config/weekly-contest-334/contest_detail/pc_card.png', 'titleSlug': 'weekly-contest-334', 'startTime': 1677378600, 'duration': 5400, 'originStartTime': 1677378600}]}}
    Traceback (most recent call last):
      File "/Users/world/Developer/AlgorithmSharkSpider/test.py", line 8, in <module>
        df2 = pd.read_json(data)
      File "/opt/homebrew/lib/python3.9/site-packages/pandas/util/_decorators.py", line 199, in wrapper
        return func(*args, **kwargs)
      File "/opt/homebrew/lib/python3.9/site-packages/pandas/util/_decorators.py", line 299, in wrapper
        return func(*args, **kwargs)
      File "/opt/homebrew/lib/python3.9/site-packages/pandas/io/json/_json.py", line 540, in read_json
        json_reader = JsonReader(
      File "/opt/homebrew/lib/python3.9/site-packages/pandas/io/json/_json.py", line 622, in __init__
        data = self._get_data_from_filepath(filepath_or_buffer)
      File "/opt/homebrew/lib/python3.9/site-packages/pandas/io/json/_json.py", line 659, in _get_data_from_filepath
        self.handles = get_handle(
      File "/opt/homebrew/lib/python3.9/site-packages/pandas/io/common.py", line 558, in get_handle
        ioargs = _get_filepath_or_buffer(
      File "/opt/homebrew/lib/python3.9/site-packages/pandas/io/common.py", line 333, in _get_filepath_or_buffer
        file_obj = fsspec.open(
      File "/opt/homebrew/lib/python3.9/site-packages/fsspec/core.py", line 419, in open
        return open_files(
      File "/opt/homebrew/lib/python3.9/site-packages/fsspec/core.py", line 272, in open_files
        fs, fs_token, paths = get_fs_token_paths(
      File "/opt/homebrew/lib/python3.9/site-packages/fsspec/core.py", line 574, in get_fs_token_paths
        chain = _un_chain(urlpath0, storage_options or {})
      File "/opt/homebrew/lib/python3.9/site-packages/fsspec/core.py", line 315, in _un_chain
        cls = get_filesystem_class(protocol)
      File "/opt/homebrew/lib/python3.9/site-packages/fsspec/registry.py", line 208, in get_filesystem_class
        raise ValueError("Protocol not known: %s" % protocol)
    ValueError: Protocol not known: {"data":{"contestUpcomingContests":[{"containsPremium":false,"title":"第 99 场双周赛","cardImg":"https
    
    2 条回复    2023-02-23 15:00:19 +08:00
    bomb77
        1
    bomb77  
       2023-02-23 14:41:24 +08:00
    python3.8
    pandas 1.5.3
    测试没有报错

    升级下版本或者看看是不是编码啥的问题?
    dcopen
        2
    dcopen  
       2023-02-23 15:00:19 +08:00   ❤️ 2
    这个问题发生在使用 Pandas 的 read_json() 函数时,该函数使用了 fsspec 库进行文件处理和读取。而在 fsspec 0.9.0 版本之后,它引入了一个新的 URL 解析机制,导致了该错误。

    在处理 JSON 数据时,您可以使用 json 库将其转换为 Python 字典,然后再使用 Pandas 的 json_normalize() 函数将其展平为 Pandas 数据帧。下面是一个示例代码:

    ```
    import json
    import pandas as pd

    data = '{"data":{"contestUpcomingContests":[{"containsPremium":false,"title":"第 99 场双周赛","cardImg":"https://assets.leetcode.cn/aliyun-lc-upload/contest-config/biweekly-contest-99/contest_detail/pc_card.png","titleSlug":"biweekly-contest-99","startTime":1677940200,"duration":5400,"originStartTime":1677940200},{"containsPremium":false,"title":"第 334 场周赛","cardImg":"https://assets.leetcode.cn/aliyun-lc-upload/contest-config/weekly-contest-334/contest_detail/pc_card.png","titleSlug":"weekly-contest-334","startTime":1677378600,"duration":5400,"originStartTime":1677378600}]}}'

    data_dict = json.loads(data)
    df = pd.json_normalize(data_dict, record_path=['data', 'contestUpcomingContests'])
    print(df)

    ```

    输出:
    ```
    containsPremium title cardImg titleSlug startTime duration originStartTime
    0 False 第 99 场双周赛 https://assets.leetcode.cn/aliyun-lc-upload/co... biweekly-contest-99 1677940200 5400 1677940200
    1 False 第 334 场周赛 https://assets.leetcode.cn/aliyun-lc-upload/co... weekly-contest-334 1677378600 5400 1677378600

    ```
    关于   ·   帮助文档   ·   博客   ·   API   ·   FAQ   ·   实用小工具   ·   2795 人在线   最高记录 6679   ·     Select Language
    创意工作者们的社区
    World is powered by solitude
    VERSION: 3.9.8.5 · 25ms · UTC 06:01 · PVG 14:01 · LAX 22:01 · JFK 01:01
    Developed with CodeLauncher
    ♥ Do have faith in what you're doing.