pandas-vet is a plugin for
flake8 that provides opinionated linting for
It began as a project during the PyCascades 2019 sprints.
pandas can be daunting. The usual internet help sites are littered with different ways to do the same thing and some features that the
pandas docs themselves discourage live on in the API.
pandas-vet is (hopefully) a way to help make
pandas a little more friendly for newcomers by taking some opinionated stances about
pandas best practices. It is designed to help users reduce the
Many of the opinions stem from Ted Petrou’s excellent Minimally Sufficient Pandas. Other ideas are drawn from
pandas docs or elsewhere. The Pandas in Black and White flashcards have a lot of the same opinions too.
pandas-vet is a plugin for
flake8. If you don’t have
flake8 already, it will install automatically when you install
The plugin is on PyPI and can be installed with:
pip install pandas-vet
pandas-vet is tested under Python 3.5 and 3.6 and should work with later versions as well.
Once installed successfully in an environment that also has
pandas-vet should run whenever
flake8 is run.
$ flake8 ...
flake8 docs for more information.
For a full list of implemented warnings, see the list below.
pandas-vet is still in the very early stages. Contributions are welcome from the community on code, tests, docs, and just about anything else.
Code of Conduct
Because this project started during the PyCascades 2019 sprints, we adopt the PyCascades minimal expectation that we “Be excellent to each another”. Beyond that, we follow the Python Software Foundation’s Community Code of Conduct.
Steps to contributing
Please submit an issue (or draft PR) first describing the types of changes you’d like to implement.
Fork the repo and create a new branch for your enhancement/fix.
Write code, docs, etc.
flake8to validate our codebase. The TravisCI integration will complain on pull requests if there are any failing tests or lint violations. To check these locally, run the following commands:
flake8 pandas_vet setup.py tests --exclude tests/data
Push to your forked repo.
Submit pull request to the parent repo from your branch. Be sure to write a clear message and reference the Issue # that relates to your pull request.
Feel good about giving back to open source projects.
How to add a check to the linter
Write tests. At a minimum, you should have test cases where the linter should catch “bad”
pandasand test cases where the linter should allow “good”
Write your check function in
pyteston the linter itself (see Steps to contributing)
PyCascades 2019 sprints team
List of implemented warnings (as of v.0.2.0)
PD001: pandas should always be imported as ‘import pandas as pd’
PD002: ‘inplace = True’ should be avoided; it has inconsistent behavior
PD003: ‘.isna’ is preferred to ‘.isnull’; functionality is equivalent
PD004: ‘.notna’ is preferred to ‘.notnull’; functionality is equivalent
PD005: Use arithmetic operator instead of method
PD006: Use comparison operator instead of method
PD007: ‘.ix’ is deprecated; use more explicit ‘.loc’ or ‘.iloc’
PD008: Use ‘.loc’ instead of ‘.at’. If speed is important, use numpy.
PD009: Use ‘.iloc’ instead of ‘.iat’. If speed is important, use numpy.
PD010 ‘.pivot_table’ is preferred to ‘.pivot’ or ‘.unstack’; provides same functionality
PD011 Use ‘.array’ or ‘.to_array()’ instead of ‘.values’; ‘values’ is ambiguous
PDO12 ‘.read_csv’ is preferred to ‘.read_table’; provides same functionality
PD013 ‘.melt’ is preferred to ‘.stack’; provides same functionality
PD015 Use ‘.merge’ method instead of ‘pd.merge’ function. They have equivalent functionality.