pyexcel - Let you focus on data, instead of file formats

Author:C.W.
Source code:http://github.com/pyexcel/pyexcel.git
Issues:http://github.com/pyexcel/pyexcel/issues
License:New BSD License
Released:0.7.0
Generated:Feb 12, 2022

Introduction

pyexcel provides one application programming interface to read, manipulate and write data in various excel formats. This library makes information processing involving excel files an enjoyable task. The data in excel files can be turned into array or dict with minimal code and vice versa. This library focuses on data processing using excel files as storage media hence fonts, colors and charts were not and will not be considered.

The idea originated from the common usability problem: when an excel file driven web application is delivered for non-developer users (ie: team assistant, human resource administrator etc). The fact is that not everyone knows (or cares) about the differences between various excel formats: csv, xls, xlsx are all the same to them. Instead of training those users about file formats, this library helps web developers to handle most of the excel file formats by providing a common programming interface. To add a specific excel file format type to you application, all you need is to install an extra pyexcel plugin. Hence no code changes to your application and no issues with excel file formats any more. Looking at the community, this library and its associated ones try to become a small and easy to install alternative to Pandas.

Support the project

If your company has embedded pyexcel and its components into a revenue generating product, please support me on github, patreon or bounty source to maintain the project and develop it further.

If you are an individual, you are welcome to support me too and for however long you feel like. As my backer, you will receive early access to pyexcel related contents.

And your issues will get prioritized if you would like to become my patreon as pyexcel pro user.

With your financial support, I will be able to invest a little bit more time in coding, documentation and writing interesting posts.

Installation

You can install pyexcel via pip:

$ pip install pyexcel

or clone it and install it:

$ git clone https://github.com/pyexcel/pyexcel.git
$ cd pyexcel
$ python setup.py install

Suppose you have the following data in a dictionary:

Name Age
Adam 28
Beatrice 29
Ceri 30
Dean 26

you can easily save it into an excel file using the following code:

>>> import pyexcel
>>> # make sure you had pyexcel-xls installed
>>> a_list_of_dictionaries = [
...     {
...         "Name": 'Adam',
...         "Age": 28
...     },
...     {
...         "Name": 'Beatrice',
...         "Age": 29
...     },
...     {
...         "Name": 'Ceri',
...         "Age": 30
...     },
...     {
...         "Name": 'Dean',
...         "Age": 26
...     }
... ]
>>> pyexcel.save_as(records=a_list_of_dictionaries, dest_file_name="your_file.xls")

And here’s how to obtain the records:

>>> import pyexcel as p
>>> records = p.iget_records(file_name="your_file.xls")
>>> for record in records:
...     print("%s is aged at %d" % (record['Name'], record['Age']))
Adam is aged at 28
Beatrice is aged at 29
Ceri is aged at 30
Dean is aged at 26
>>> p.free_resources()

Custom data rendering:

>>> # pip install pyexcel-text==0.2.7.1
>>> import pyexcel as p
>>> ccs_insight2 = p.Sheet()
>>> ccs_insight2.name = "Worldwide Mobile Phone Shipments (Billions), 2017-2021"
>>> ccs_insight2.ndjson = """
... {"year": ["2017", "2018", "2019", "2020", "2021"]}
... {"smart phones": [1.53, 1.64, 1.74, 1.82, 1.90]}
... {"feature phones": [0.46, 0.38, 0.30, 0.23, 0.17]}
... """.strip()
>>> ccs_insight2
pyexcel sheet:
+----------------+------+------+------+------+------+
| year           | 2017 | 2018 | 2019 | 2020 | 2021 |
+----------------+------+------+------+------+------+
| smart phones   | 1.53 | 1.64 | 1.74 | 1.82 | 1.9  |
+----------------+------+------+------+------+------+
| feature phones | 0.46 | 0.38 | 0.3  | 0.23 | 0.17 |
+----------------+------+------+------+------+------+

Advanced usage :fire:

If you are dealing with big data, please consider these usages:

>>> def increase_everyones_age(generator):
...     for row in generator:
...         row['Age'] += 1
...         yield row
>>> def duplicate_each_record(generator):
...     for row in generator:
...         yield row
...         yield row
>>> records = p.iget_records(file_name="your_file.xls")
>>> io=p.isave_as(records=duplicate_each_record(increase_everyones_age(records)),
...     dest_file_type='csv', dest_lineterminator='\n')
>>> print(io.getvalue())
Age,Name
29,Adam
29,Adam
30,Beatrice
30,Beatrice
31,Ceri
31,Ceri
27,Dean
27,Dean

Two advantages of above method:

  1. Add as many wrapping functions as you want.
  2. Constant memory consumption

For individual excel file formats, please install them as you wish:

A list of file formats supported by external plugins
Package name Supported file formats Dependencies
pyexcel-io csv, csvz [1], tsv, tsvz [2]  
pyexcel-xls xls, xlsx(read only), xlsm(read only) xlrd, xlwt
pyexcel-xlsx xlsx openpyxl
pyexcel-ods3 ods pyexcel-ezodf, lxml
pyexcel-ods ods odfpy
Dedicated file reader and writers
Package name Supported file formats Dependencies
pyexcel-xlsxw xlsx(write only) XlsxWriter
pyexcel-libxlsxw xlsx(write only) libxlsxwriter
pyexcel-xlsxr xlsx(read only) lxml
pyexcel-xlsbr xlsb(read only) pyxlsb
pyexcel-odsr read only for ods, fods lxml
pyexcel-odsw write only for ods loxun
pyexcel-htmlr html(read only) lxml,html5lib
pyexcel-pdfr pdf(read only) camelot

Plugin shopping guide

Since 2020, all pyexcel-io plugins have dropped the support for python versions which are lower than 3.6. If you want to use any of those Python versions, please use pyexcel-io and its plugins versions that are lower than 0.6.0.

Except csv files, xls, xlsx and ods files are a zip of a folder containing a lot of xml files

The dedicated readers for excel files can stream read

In order to manage the list of plugins installed, you need to use pip to add or remove a plugin. When you use virtualenv, you can have different plugins per virtual environment. In the situation where you have multiple plugins that does the same thing in your environment, you need to tell pyexcel which plugin to use per function call. For example, pyexcel-ods and pyexcel-odsr, and you want to get_array to use pyexcel-odsr. You need to append get_array(…, library=’pyexcel-odsr’).

Other data renderers
Package name Supported file formats Dependencies Python versions
pyexcel-text write only:rst, mediawiki, html, latex, grid, pipe, orgtbl, plain simple read only: ndjson r/w: json tabulate 2.6, 2.7, 3.3, 3.4 3.5, 3.6, pypy
pyexcel-handsontable handsontable in html handsontable same as above
pyexcel-pygal svg chart pygal 2.7, 3.3, 3.4, 3.5 3.6, pypy
pyexcel-sortable sortable table in html csvtotable same as above
pyexcel-gantt gantt chart in html frappe-gantt except pypy, same as above

Footnotes

[1]zipped csv file
[2]zipped tsv file

For compatibility tables of pyexcel-io plugins, please click here

Plugin compatibility table
pyexcel pyexcel-io pyexcel-text pyexcel-handsontable pyexcel-pygal pyexcel-gantt
0.6.5+ 0.6.2+ 0.2.6+ 0.0.1+ 0.0.1 0.0.1
0.5.15+ 0.5.19+ 0.2.6+ 0.0.1+ 0.0.1 0.0.1
0.5.14 0.5.18 0.2.6+ 0.0.1+ 0.0.1 0.0.1
0.5.10+ 0.5.11+ 0.2.6+ 0.0.1+ 0.0.1 0.0.1
0.5.9.1+ 0.5.9.1+ 0.2.6+ 0.0.1 0.0.1 0.0.1
0.5.4+ 0.5.1+ 0.2.6+ 0.0.1 0.0.1 0.0.1
0.5.0+ 0.4.0+ 0.2.6+ 0.0.1 0.0.1 0.0.1
0.4.0+ 0.3.0+ 0.2.5      
A list of supported file formats
file format definition
csv comma separated values
tsv tab separated values
csvz a zip file that contains one or many csv files
tsvz a zip file that contains one or many tsv files
xls a spreadsheet file format created by MS-Excel 97-2003
xlsx MS-Excel Extensions to the Office Open XML SpreadsheetML File Format.
xlsm an MS-Excel Macro-Enabled Workbook file
ods open document spreadsheet
fods flat open document spreadsheet
json java script object notation
html html table of the data structure
simple simple presentation
rst rStructured Text presentation of the data
mediawiki media wiki table

Usage

Suppose you want to process the following excel data :

Here are the example usages:

>>> import pyexcel as pe
>>> records = pe.iget_records(file_name="your_file.xls")
>>> for record in records:
...     print("%s is aged at %d" % (record['Name'], record['Age']))
Adam is aged at 28
Beatrice is aged at 29
Ceri is aged at 30
Dean is aged at 26
>>> pe.free_resources()

New tutorial

Old tutorial

Indices and tables