pyexcel - Let you focus on data, instead of file formats¶
Author: | C.W. |
---|---|
Source code: | http://github.com/pyexcel/pyexcel.git |
Issues: | http://github.com/pyexcel/pyexcel/issues |
License: | New BSD License |
Released: | 0.6.1 |
Generated: | May 02, 2020 |
Introduction¶
pyexcel provides one application programming interface to read, manipulate and write data in various excel formats. This library makes information processing involving excel files an enjoyable task. The data in excel files can be turned into array or dict with minimal code and vice versa. This library focuses on data processing using excel files as storage media hence fonts, colors and charts were not and will not be considered.
The idea originated from the common usability problem: when an excel file driven web application is delivered for non-developer users (ie: team assistant, human resource administrator etc). The fact is that not everyone knows (or cares) about the differences between various excel formats: csv, xls, xlsx are all the same to them. Instead of training those users about file formats, this library helps web developers to handle most of the excel file formats by providing a common programming interface. To add a specific excel file format type to you application, all you need is to install an extra pyexcel plugin. Hence no code changes to your application and no issues with excel file formats any more. Looking at the community, this library and its associated ones try to become a small and easy to install alternative to Pandas.
Support the project¶
If your company has embedded pyexcel and its components into a revenue generating product, please support me on patreon or bounty source to maintain the project and develop it further.
If you are an individual, you are welcome to support me too and for however long you feel like. As my backer, you will receive early access to pyexcel related contents.
And your issues will get prioritized if you would like to become my patreon as pyexcel pro user.
With your financial support, I will be able to invest a little bit more time in coding, documentation and writing interesting posts.
Installation¶
You can install pyexcel via pip:
$ pip install pyexcel
or clone it and install it:
$ git clone https://github.com/pyexcel/pyexcel.git
$ cd pyexcel
$ python setup.py install
For individual excel file formats, please install them as you wish:
Package name | Supported file formats | Dependencies | Python versions |
---|---|---|---|
pyexcel-io | csv, csvz [#f1]_, tsv, tsvz [#f2]_ | 2.6, 2.7, 3.3, 3.4, 3.5, 3.6 pypy | |
pyexcel-xls | xls, xlsx(read only), xlsm(read only) | xlrd, xlwt | same as above |
pyexcel-xlsx | xlsx | openpyxl | same as above |
pyexcel-ods3 | ods | pyexcel-ezodf, lxml | 2.6, 2.7, 3.3, 3.4 3.5, 3.6 |
pyexcel-ods | ods | odfpy | same as above |
Package name | Supported file formats | Dependencies | Python versions |
---|---|---|---|
pyexcel-xlsxw | xlsx(write only) | XlsxWriter | Python 2 and 3 |
pyexcel-xlsxr | xlsx(read only) | lxml | same as above |
pyexcel-xlsbr | xlsx(read only) | pyxlsb | same as above |
pyexcel-odsr | read only for ods, fods | lxml | same as above |
pyexcel-odsw | write only for ods | loxun | same as above |
pyexcel-htmlr | html(read only) | lxml,html5lib | same as above |
pyexcel-pdfr | pdf(read only) | pdftables | Python 2 only. |
Package name | Supported file formats | Dependencies | Python versions |
---|---|---|---|
pyexcel-text | write only:rst, mediawiki, html, latex, grid, pipe, orgtbl, plain simple read only: ndjson r/w: json | tabulate | 2.6, 2.7, 3.3, 3.4 3.5, 3.6, pypy |
pyexcel-handsontable | handsontable in html | handsontable | same as above |
pyexcel-pygal | svg chart | pygal | 2.7, 3.3, 3.4, 3.5 3.6, pypy |
pyexcel-sortable | sortable table in html | csvtotable | same as above |
pyexcel-gantt | gantt chart in html | frappe-gantt | except pypy, same as above |
In order to manage the list of plugins installed, you need to use pip to add or remove a plugin. When you use virtualenv, you can have different plugins per virtual environment. In the situation where you have multiple plugins that does the same thing in your environment, you need to tell pyexcel which plugin to use per function call. For example, pyexcel-ods and pyexcel-odsr, and you want to get_array to use pyexcel-odsr. You need to append get_array(…, library=’pyexcel-odsr’).
Footnotes
[1] | zipped csv file |
[2] | zipped tsv file |
For compatibility tables of pyexcel-io plugins, please click here
pyexcel | pyexcel-io | pyexcel-text | pyexcel-handsontable | pyexcel-pygal | pyexcel-gantt |
---|---|---|---|---|---|
0.5.15+ | 0.5.19+ | 0.2.6+ | 0.0.1+ | 0.0.1 | 0.0.1 |
0.5.14 | 0.5.18 | 0.2.6+ | 0.0.1+ | 0.0.1 | 0.0.1 |
0.5.10+ | 0.5.11+ | 0.2.6+ | 0.0.1+ | 0.0.1 | 0.0.1 |
0.5.9.1+ | 0.5.9.1+ | 0.2.6+ | 0.0.1 | 0.0.1 | 0.0.1 |
0.5.4+ | 0.5.1+ | 0.2.6+ | 0.0.1 | 0.0.1 | 0.0.1 |
0.5.0+ | 0.4.0+ | 0.2.6+ | 0.0.1 | 0.0.1 | 0.0.1 |
0.4.0+ | 0.3.0+ | 0.2.5 |
file format | definition |
---|---|
csv | comma separated values |
tsv | tab separated values |
csvz | a zip file that contains one or many csv files |
tsvz | a zip file that contains one or many tsv files |
xls | a spreadsheet file format created by MS-Excel 97-2003 [#f1]_ |
xlsx | MS-Excel Extensions to the Office Open XML SpreadsheetML File Format. [#f2]_ |
xlsm | an MS-Excel Macro-Enabled Workbook file |
ods | open document spreadsheet |
fods | flat open document spreadsheet |
json | java script object notation |
html | html table of the data structure |
simple | simple presentation |
rst | rStructured Text presentation of the data |
mediawiki | media wiki table |
[f1] | quoted from whatis.com. Technical details can be found at MSDN XLS |
[f2] | xlsx is used by MS-Excel 2007, more information can be found at MSDN XLSX |
Usage¶
Suppose you want to process the following excel data :
Here are the example usages:
>>> import pyexcel as pe
>>> records = pe.iget_records(file_name="your_file.xls")
>>> for record in records:
... print("%s is aged at %d" % (record['Name'], record['Age']))
Adam is aged at 28
Beatrice is aged at 29
Ceri is aged at 30
Dean is aged at 26
>>> pe.free_resources()
New tutorial¶
- One liners
- Stream APIs for big file : A set of two liners
- For web developer
- Pyexcel data renderers
- Sheet
- Book
- Working with databases
Old tutorial¶
- Work with excel files
- Work with excel files in memory
- Sheet: Data conversion
- How to obtain records from an excel sheet
- How to save an python array as an excel file
- How to save an python array as a csv file with special delimiter
- How to get a dictionary from an excel sheet
- How to obtain a dictionary from a multiple sheet book
- How to save a dictionary of two dimensional array as an excel file
- How to import an excel sheet to a database using SQLAlchemy
- How to open an xls file and save it as csv
- How to open an xls file and save it as xlsx
- How to open a xls multiple sheet excel book and save it as csv
- Dot notation for data source
- Read partial data
- Sheet: Data Access
- Sheet: Data manipulation
- Sheet: Data filtering
- Sheet: Formatting
- Book: Sheet operations
Cook book¶
- Recipes
- Update one column of a data file
- Update one row of a data file
- Merge two files into one
- Select candidate columns of two files and form a new one
- Merge two files into a book where each file become a sheet
- Merge all excel files in directory into a book where each file become a sheet
- Split a book into single sheet files
- Extract just one sheet from a book
- Loading from other sources
Real world cases¶
API documentation¶
Developer’s guide¶
Change log¶
- What’s breaking in 0.6.0
- What’s breaking in 0.5.9
- Migrate away from 0.4.3
- Migrate from 0.2.x to 0.3.0+
- Migrate from 0.2.1 to 0.2.2+
- Migrate from 0.1.x to 0.2.x
- Change log
- 0.6.1 - 2.05.2020
- 0.6.0 - 21.04.2020
- 0.5.15 - 07.07.2019
- 0.5.14 - 12.06.2019
- 0.5.13 - 12.03.2019
- 0.5.12 - 25.02.2019
- 0.5.11 - 22.02.2019
- 0.5.10 - 3.12.2018
- 0.5.9.1 - 30.08.2018
- 0.5.9 - 30.08.2018
- 0.5.8 - unreleased
- 0.5.7 - 11.01.2018
- 0.5.6 - 23.10.2017
- 0.5.5 - 20.10.2017
- 0.5.4 - 27.09.2017
- 0.5.3 - 01-08-2017
- 0.5.2 - 26-07-2017
- 0.5.1 - 12.06.2017
- 0.5.0 - 19.06.2017
- 0.4.5 - 17.03.2017
- 0.4.4 - 06.02.2017
- 0.4.3 - 26.01.2017
- 0.4.2 - 17.01.2017
- 0.4.1 - 23.12.2016
- 0.4.0 - 22.12.2016
- 0.3.3 - 07.11.2016
- 0.3.2 - 02.11.2016
- 0.3.0 - 28.10.2016
- 0.2.5 - 31.08.2016
- 0.2.4 - 14.07.2016
- 0.2.3 - 11.07.2016
- 0.2.2 - 01.06.2016
- 0.2.1 - 23.04.2016
- 0.2.0 - 17.01.2016
- 0.1.7 - 03.07.2015
- 0.1.6 - 13.06.2015
- 0.0.13 - 07.02.2015
- 0.0.12 - 25.01.2015
- 0.0.10 - 15.12.2015
- 0.0.4 - 12.10.2014
- 0.0.1 - 14.09.2014
- Note on pypy and lxml