如何在Python中轻松使用CVS,JSON,XML

Original: 爱技术的 Python大本营 2019-05-25

作者 | K. Delphino

译者 | 刘畅

编辑 | Jane

出品 | Python大本营（id：pythonnews）

【导语】由于 Python 出色的灵活性和易用性，已经成为最受欢迎的编程语言之一。更受数据科学家青睐的是，用 Python 可以简单方便的处理大型数据集。今天的文章，我们就为大家介绍一下，如何用 Python 简单处理 CSV、JSON 和 XML 三种主要的数据格式文件。

如今，每家技术公司都在制定数据战略。他们都意识到无论是深入的洞察力还是干净的数据，都能给公司带来竞争的关键优势。如果能更有效的使用数据，就可以提供更深层次、不易发现的洞察力。

经过多年发展，存储数据的格式有很多，然而，在日常使用中大家最常用的三种格式依然是：CSV，JSON 和 XML。因此，这篇文章就分享给大家用 Python 处理这三种流行数据格式最简单的方法！

CSV数据格式

CSV 文件是最常用的数据存储方法。大部分 Kaggle 比赛中的数据都存储在 CSV 文件中。Python 中有内置的 CSV 库来支持读写操作，通常情况下，我们会先将数据读入表中。

在下面的代码中，当调用 csv.reader（）函数时，可以访问所有的 CSV 数据。而 csvreader.next（）函数的功能是从 CSV 中读取一行，当多次连续调用时，会自动读取下一行。也可以使用 for 循环遍历 csv 的每一行，同样用 csv.reader（）函数读取每一行。这里需要注意，要确保每行中的列数相同，否则，在处理时，可能会遇到错误。

import csv filename = "my_data.csv" fields = [] rows = [] # Reading csv file with open(filename, 'r') as csvfile: # Creating a csv reader object csvreader = csv.reader(csvfile) # Extracting field names in the first row fields = csvreader.next() # Extracting each data row one by one for row in csvreader: rows.append(row) # Printing out the first 5 rows for row in rows[:5]: print(row)

同样，用 Python 写入 CSV 文件也很容易。先在单个列表中设置字段名称，并在列表中写入数据。这次创建一个 writer（）对象，使用这个方法将数据写入文件，与读取 CSV 数据的方式非常相似。

import csv # Field names fields = ['Name', 'Goals', 'Assists', 'Shots'] # Rows of data in the csv file rows = [ ['Emily', '12', '18', '112'], ['Katie', '8', '24', '96'], ['John', '16', '9', '101'], ['Mike', '3', '14', '82']] filename = "soccer.csv" # Writing to csv file with open(filename, 'w+') as csvfile: # Creating a csv writer object csvwriter = csv.writer(csvfile) # Writing the fields csvwriter.writerow(fields) # Writing the data rows csvwriter.writerows(rows)

当然，如果安装了 Pandas 库，一旦将数据读入了变量中，就可以更轻松地处理数据。从 CSV 读取数据和将其写回文件中仅需要一行代码！

import pandas as pd

filename = "my_data.csv"

# Read in the datadata = pd.read_csv(filename)

# Print the first 5 rowsprint(data.head(5))

# Write the data to filedata.to_csv("new_data.csv", sep=",", index=False)

还可以使用 Pandas 库将 CSV 文件转换为快速的字典列表。一旦将数据格式化为字典列表后，就可以使用 dicttoxml 库将其转换为 XML 格式，或者保存为 JSON 文件格式！

import pandas as pdfrom dicttoxml import dicttoxmlimport json

# Building our dataframedata = {'Name': ['Emily', 'Katie', 'John', 'Mike'], 'Goals': [12, 8, 16, 3], 'Assists': [18, 24, 9, 14], 'Shots': [112, 96, 101, 82] }

df = pd.DataFrame(data, columns=data.keys())

# Converting the dataframe to a dictionary# Then save it to filedata_dict = df.to_dict(orient="records")with open('output.json', "w+") as f: json.dump(data_dict, f, indent=4)

# Converting the dataframe to XML# Then save it to filexml_data = dicttoxml(data_dict).decode()with open("output.xml", "w+") as f: f.write(xml_data)

JSON数据格式

JSON 提供一种干净且易阅读的格式，采用字典结构。与 CSV 类似，同样有一个内置的 JSON 模块，使读写 JSON 文件变得非常简单！当读取 JSON 文件时，将以字典的格式存储，然后可以将该字典写入文件。

import jsonimport pandas as pd

# Read the data from file# We now have a Python dictionarywith open('data.json') as f: data_listofdict = json.load(f) # We can do the same thing with pandasdata_df = pd.read_json('data.json', orient='records')

# We can write a dictionary to JSON like so# Use 'indent' and 'sort_keys' to make the JSON# file look nicewith open('new_data.json', 'w+') as json_file: json.dump(data_listofdict, json_file, indent=4, sort_keys=True)

# And again the same thing with pandasexport = data_df.to_json('new_data.json', orient='records')

一旦获取了数据，就可以通过 Pandas 库或者 Python 内置的 CSV 模块轻松的将其转换为 CSV 格式。如果需要转换成 XML 格式，可以使用 dicttoxml 库。

import jsonimport pandas as pdimport csv

# Read the data from file# We now have a Python dictionarywith open('data.json') as f: data_listofdict = json.load(f) # Writing a list of dicts to CSVkeys = data_listofdict[0].keys()with open('saved_data.csv', 'wb') as output_file: dict_writer = csv.DictWriter(output_file, keys) dict_writer.writeheader() dict_writer.writerows(data_listofdict)

XML数据格式

XML 格式与 CSV 和 JSON 有点不同。通常，因为 CSV 和 JSON 本身简单，它们能既简单又快速的读写，有很好的解释性，被广泛使用。解析 JSON 或 CSV 格式非常的轻量级，不需要额外的工作。

相反，XML 的操作则有些繁琐。如果你是发送这种格式的数据，需要更多的带宽、存储空间和运行时间。但是 XML 确实有一些 JSON 和 CSV 不具备的功能，例如，你可以使用命名空间构建和共享标准结构，更好地表示继承，以及使用 XML 模式，DTD 等表示数据的行业标准化方法。

为了读入 XML 格式的数据，可以使用 Python 内置的 XML 模块和子模块 ElementTree。因此，在下面的示例中使用 xmltodict 库将 ElementTree 对象转换为字典。一旦有了字典，就可以像之前那样将其转换为 CSV，JSON 或 Pandas Dataframe 等格式！

import xml.etree.ElementTree as ETimport xmltodictimport json

tree = ET.parse('output.xml')xml_data = tree.getroot()

xmlstr = ET.tostring(xml_data, encoding='utf8', method='xml')

data_dict = dict(xmltodict.parse(xmlstr))

print(data_dict)

with open('new_data_2.json', 'w+') as json_file: json.dump(data_dict, json_file, indent=4, sort_keys=True)

原文链接：
https://towardsdatascience.com/the-easy-way-to-work-with-csv-json-and-xml-in-python-5056f9325ca9

（*本文由Python大本营编译，转载请联系微信1092722531）

◆

CTA核心技术及应用峰会

◆

CTA核心技术及应用峰会嘉宾最新揭秘！五月打卡杭州，共议机器学习与知识图谱技术与落地。

扫描二维码立即抢购五一节特惠单日票，使用优惠码「CTA-51th」，立减200元，单日会议票仅售299元（仅限5.1-5.4）。

添加小助手微信15101014297，备注「CTA」，了解大会详情。

❤点击“阅读原文”，了解更多活动信息。

故意按摩让女生“产生欲望”后发生关系，算性侵吗？

中央批准朱军同志职务调整

三联，刺痛了多少中国人

戴上这手表，不止优雅，蛇年好运连连来

古琴养身：为什么弹古琴的女人会更有气质？99%的人都不知道！