查看原文
其他

英文分享 | 2018年 Python 的好与坏

Python猫 2019-04-28

好久没给大家分享英文博客了,大家的英文阅读能力没有退步吧?(有也不会认的 :))前几天,我被一些小伙伴考四六级的消息刷屏了,不知道大家考得如何啊?虽然我已毕业几年了,不用为考级而学习英语,但是,我也意识到,除了编程技能,英语技能是万万不能丢的。所以,我开始培养起阅读英文材料的习惯了(两周前还尝试翻译了一篇),在公众号分享英文文章也是一种有益的尝试。曾有读者留言,说关注咱公众号还能练习英语,他觉得很赞。这个回复令我信心大增,所以这种分享会一直延续下去的。我会控制好频率,同时在标题注明是英文分享,以示区分。今天分享的是 Medium 网站上的一篇关于 Python 的年度总结。作者分 Good 和 Bad 两方面,介绍了几个重要的模块,比如:JupyterLab、mypy、Pipfile and pipenv、f-strings,等等。希望对你有帮助。(PS:Python猫读者交流群建立起来了,详情请看今日的第二条推文。)


原标题:State of Python in 2018

作者:Daniel Kats

原文:http://t.cn/E42RMi9(有删节)


I love python. I’ve been using Python for almost 10 years now, across projects both personal and professional. My work is equal parts data analytics and rapid prototyping, so Python is a natural fit. The great draw of Python is it has packages for everything: machine learning, data exploration, reproducible research, visualization, cloud functionality, web APIs, and the kitchen sink.

However, as with any engineering effort, Python is a work-in-progress. Our perception of the language today is different than it was even five years ago, so things that may have seemed outlandish then are now not only possible, but logical. In this post, I want to lay out what I see as promising directions for the community, and how I would like to see it grow.

The Good

Many good things have either landed in 2018 in Pythonland, or have overcome their growing pains. Here are my personal favourites:

JupyterLab

A Jupyter Notebook is a web application to execute Python (and other languages) and view the results in-line including graphs, prettified tables, and markdown-formatted prose. It also automatically saves intermediate results (similar to a REPL), allows exporting to many formats, and has a hundred other features. For a deeper dive, see my PyCon talk. Jupyter Notebooks are very widely used in the community, especially those in research and scientific fields. The Jupyter team very justifiably won the 2017 ACM Software System Award.

JupyterLab is an exciting improvement over traditional Jupyter notebooks. It includes some compelling features like cell drag-and-drop, inline viewing of data files (like CSV), a tabbed environment, and a more command-centered interface. It definitely still feels like a beta, with some glitches in Reveal.js slide export functionality and cell collapse not working as expected. But on the whole it’s a perfect example of a good tool getting even better and growing to fit the sophistication of its users.

mypy

mypy, a static type checking tool for Python, has existed for a while. However, it has gotten really good this year, to the point where you can integrate it into your production project as part of git hooks or other CI flow. I find it an extremely helpful addition to all codebases, catching the vast majority of my mistakes before I write a single line of test code. It’s not without pitfalls however. There are many cases where you have to make annotations that feel burdensome

__init__(self, *args) -> None

and other behaviour which I view as just strange. The lack of typeshed files for many common modules¹ such as:

  • flask

  • msgpack

  • coloredlogs

  • flask-restplus

  • sqlalchemy

  • nacl

continues to be an issue in integrating this into your CI system without significant configuration. The — ignore-missing-imports option becomes basically mandatory. In the future, I hope that it becomes a community standard to provide typeshed files for all modules intended to be used as libraries.

Pipfile and pipenv

I’m really excited about Pipfiles! Pipfiles are an implementation of PEP508, which motivates a replacement dependency-management system to requirements.txt.

The top-level motivation is that dependency management with pip feels stale compared to similar systems in other languages like rust and javascript. While the flaws with pip/requirements.txt seem to be well-known in the community, the closest article I’ve seen to an enumeration is this post. I recommend a read, but here is a TLDR:

There is no standard for requirements.txt: is it an enumeration of all primary and secondary dependencies, or just the strict requirements? Does it include pinned versions? Additionally, splitting out development-time requirements is very ad-hoc². Different groups do different things, which makes reproducible builds a problem.

Keeping the list of dependencies up to date required pip install $packagefollowed by pip freeze > requirements.txt, which was a really clunky workflow with a ton of problems.

The development-management ecosystem consists of three tools and standards (virtualenv, pip, and requirements.txt) which do not interop cleanly. Since you’re trying to accomplish a single task, why isn’t there a single tool to help?

Enter pipenv.

Pipenv creates a virtualenv automatically, installs and manages dependencies in that virtualenv, and keeps the Pipfile updated.

While the idea is great, using it is very cumbersome. I’ve run into many issues using it in practice and often have to fall back on the previous way of doing things — using an explicit virtualenv for example. I also found that locking is very slow (a problem partially stemming from the setup.py standard, which is the source of many other issues in the tooling ecosystem).

f-strings

f-strings are fantastic! Many others have written about the joy of f-strings, from their natural syntax to the performance improvements they bring. I see no reason to repeat these points, I just want to say it’s an amazing feature that I have been using regularly since they landed.

An annoyance they introduce is the dichotomy between writing printstatements and logging statements. The logging module is great, and by default does not format strings if that log message is turned off. So you might write:

x = 3
logging.debug(‘x=%d’, x)

Which would print x=3if the log-level is set to DEBUG, but would not even perform the string interpolation if the log-level is set higher. This is because logging.debug is a function, and the strings are passed as arguments. You can see how it works in the very readable C source code. However, this functionality disappears if you write the following:

x = 3
logging.debug(f’x={x}’)

The string interpolation happens regardless of log-level. This makes sense at a language-level, but the practical consequences are irritating in my natural workflow. I write print statements first when debugging my code, and when it looks like everything is right I transform them into logging statements later. So each print statement has to be manually rewritten to fit the different type of string interpolation. I don’t have a good idea of how to solve this problem, but I want to point it out as I haven’t seen anyone else write about this particular problem.

The Bad

As with any project that has been around for as long as Python (wow it’s as old as I am), there are modules and ideas which are showing their age. This is not meant to be a shade-throwing contest, but laying down the gauntlet to say we as a community can do better.

tox

Tox is still the best (or perhaps more accurately the de-facto) test-runner we have in Pythonland, and it’s quite bad. Not only is the syntax for tox.inifiles a bit unintuitive, the tool is also extremely slow. It’s not really tox’s fault, as the whole setup.py system is broken by design. Because these files declare package dependencies and at the same time can execute code, discovering dependencies is inherently slow. This leads to slowness in a number of tools. I believe this is something we should tackle as a community in 2019.

As an aside, there is still no Pipfile support, which makes the value proposition of using it much lower. As with everything, it’s not just about how good the idea is, but the tooling support around it.

type annotations are for tools only

Quoting from PEP0484:

Using type hints for performance optimizations is left as an exercise for the reader.

This is understandable given the state of Python at the time that the PEP was written, but it’s now time to move on. We have successfully transitioned to Python3, and 359/360 of the most commonly downloaded packages on PyPi are Python3-compliant. Type hints are here to stay, and are well-loved by the community. Moving forward, Python type hints should carry additional benefits such as performance optimization and automatic runtime type assertions. I find runtime type assertions to be both extremely helpful (especially in libraries), and very cumbersome to write manually. With type hints, this is especially annoying as you have to maintain multiple sources of truth for types.

As others have written, Python 4 will probably have JIT as a first-class feature. This seems like a logical place to add performance optimization in response to type annotations.

variable mutability

One of my biggest gripes with Python right now is the lack of const or its equivalent. Of all the mistakes I make during coding, a solid 90% of them can be traced to either type-related mistakes (now mostly caught with mypy) or accidental reuse of a previous variable within the same function when I thought I was creating a new variable. I understand that there are packages for this, but I want const to be a first-class citizen.

nbconvert

The nbconvert project is, on the whole, amazing. It allows the conversion of Jupyter notebooks into various other formats including PDF, Reveal.js slides, or an executable script. I have used the last two extensively in the past couple of months, and they have honestly changed my workflow. I can put together a notebook, then at the last moment convert it into a presentation for a weekly meeting with my colleagues to show my progress. Similarly, I can develop an idea in a notebook, then convert it into a script and put it into production with minimal changes.

That’s the idea, anyway. The reality is that the scripts produced from any sizable notebook require so much manual effort to convert that it’s often worth it to write them from scratch using cut-and-paste. I heard from a few companies that they have created wrappers around nbconvert to make it a bit more wieldy. I encourage these folks to open-source these contributions, if only to alleviate my personal pain.

一只伪喵星来客

一个有趣又有用的学习分享平台

专注Python技术、数据科学和深度学习

兼具极客思维与人文情怀

欢迎你关注

微信号:python_cat

    您可能也对以下帖子感兴趣

    文章有问题?点此查看未经处理的缓存