pyexcel - Let you focus on data, instead of file formats¶
Author: | C.W. |
---|---|
Source code: | http://github.com/pyexcel/pyexcel.git |
Issues: | http://github.com/pyexcel/pyexcel/issues |
License: | New BSD License |
Development: | 0.4.5 |
Released: | 0.4.4 |
Generated: | Apr 19, 2017 |
Introduction¶
pyexcel provides one application programming interface to read, manipulate and write data in different excel formats. This library makes information processing involving excel files an enjoyable task. The data in excel files can be turned into array or dict with least code, vice versa. This library focuses on data processing using excel files as storage media hence fonts, colors and charts were not and will not be considered.
The idea originated from the common usability problem when developing an excel file driven web applications for non-technical office workers: such as office assistant, human resource administrator. The fact is that not all people know the difference among various excel formats: csv, xls, xlsx. Instead of training those people about file formats, this library helps web developers to handle most of the excel file formats by providing a common programming interface. To add a specific excel file format to you application, all you need is to install an extra pyexcel plugin. No code change to your application. Looking at the community, this library and its associated ones try to become a small and easy to install alternative to Pandas.
Note
Since version 0.2.2, no longer a plugin should be explicitly imported. They are imported if they are installed. Please use pip to manage the plugins.
Installation¶
You can install it via pip:
$ pip install pyexcel
or clone it and install it:
$ git clone http://github.com/pyexcel/pyexcel.git
$ cd pyexcel
$ python setup.py install
For individual excel file formats, please install them as you wish:
Package name | Supported file formats | Dependencies | Python versions |
---|---|---|---|
pyexcel-io | csv, csvz [1], tsv, tsvz [2] | 2.6, 2.7, 3.3, 3.4, 3.5, 3.6 pypy | |
pyexcel-xls | xls, xlsx(read only), xlsm(read only) | xlrd, xlwt | same as above |
pyexcel-xlsx | xlsx | openpyxl | same as above |
pyexcel-xlsxw | xlsx(write only) | XlsxWriter | same as above |
pyexcel-ods3 | ods | ezodf, lxml | 2.6, 2.7, 3.3, 3.4 3.5, 3.6 |
pyexcel-ods | ods | odfpy | same as above |
pyexcel-odsr | ods(read only) | lxml | same as above |
pyexcel-text | (write only)json, rst, mediawiki, html, latex, grid, pipe, orgtbl, plain simple | tabulate | 2.6, 2.7, 3.3, 3.4 3.5, pypy, pypy3 |
Footnotes
[1] | zipped csv file |
[2] | zipped tsv file |
For compatibility tables of pyexcel-io plugins, please click here
pyexcel | pyexcel-io | pyexcel-text |
---|---|---|
0.4.0+ | 0.3.0 | 0.2.5 |
0.3.0+ | 0.2.3 | 0.2.4 |
0.2.2+ | 0.2.0+ | 0.2.1+ |
0.2.1 | 0.1.0 | 0.2.0 |
0.2.0 | 0.1.0 | 0.1.0+ |
Usage¶
Suppose you want to process the following excel data :
Name | Age |
---|---|
Adam | 28 |
Beatrice | 29 |
Ceri | 30 |
Dean | 26 |
Here are the example usages:
>>> import pyexcel as pe
>>> records = pe.iget_records(file_name="your_file.xls")
>>> for record in records:
... print("%s is aged at %d" % (record['Name'], record['Age']))
Adam is aged at 28
Beatrice is aged at 29
Ceri is aged at 30
Dean is aged at 26
Design¶
Introduction¶
This section introduces Excel data models, its representing data structures and provides an overview of formatting, transformation, manipulation supported by pyexcel
Data models and data structures¶
When dealing with excel files, there are three primary objects: cell, sheet and book. A book contains one or more sheets and a sheet is consisted of a sheet name and a two dimensional array of cells. Although a sheet can contain charts and a cell can have formula, styling properties, this library ignores them and pay attention to the data in the cell and its data type. So, in the context of this library, the definition of those three concepts are:
concept | definition | pyexcel data model |
---|---|---|
a cell | is a data unit | a Python data type |
a sheet | is a named two dimensional array of data units | Sheet |
a book | is a dictionary of two dimensional array of data units. | Book |
Data source¶
The most popular data source is an excel file. Libre Office/Microsoft Excel could easily generate an new excel file of desired format. Besides a physical file, this library recognizes additional three additional sources:
- Excel files in computer memory. For example when a file was uploaded to a Python server for information processing, if it is relatively small, it will be stored in memory.
- Database tables. For example, a client would like to have a snapshot of some database table in an excel file and ask it to be sent to him.
- Python structures. For example, a developer may have scrapped a site and hence stored data in Python array or dictionary. He may want to save those information as a file.
Data format¶
This library and its plugins support most of the frequently used excel file formats.
file format | definition |
---|---|
csv | comma separated values |
tsv | tab separated values |
csvz | a zip file that contains one or many csv files |
tsvz | a zip file that contains one or many tsv files |
xls | a spreadsheet file format created by MS-Excel 97-2003 [1] |
xlsx | MS-Excel Extensions to the Office Open XML SpreadsheetML File Format. [2] |
xlsm | an MS-Excel Macro-Enabled Workbook file |
ods | open document spreadsheet |
json | java script object notation |
html | html table of the data structure |
simple | simple presentation |
rst | rStructured Text presentation of the data |
mediawiki | media wiki table |
See also A list of file formats supported by external plugins.
Data transformation¶
Quite often, a developer would like to have the excel data in a Python data structures. This library supports the conversions from previous three data source to the following list of data structures, and vice versa.
Psudo name | Python name | Related model |
---|---|---|
two dimensional array | a list of lists | Sheet |
a dictionary of one dimensional arrays | a dictionary of lists | Sheet |
a list of dictionaries | a list of dictionaries | Sheet |
a dictionary of two dimensional arrays | a dictionary of lists of lists | Book |
Examples:
>>> two_dimensional_list = [
... [1, 2, 3, 4],
... [5, 6, 7, 8],
... [9, 10, 11, 12],
... ]
>>> a_dictionary_of_one_dimensional_arrays = {
... "Column 1": [1, 2, 3, 4],
... "Column 2": [5, 6, 7, 8],
... "Column 3": [9, 10, 11, 12],
... }
>>> a_list_of_dictionaries = [
... {
... "Name": 'Adam',
... "Age": 28
... },
... {
... "Name": 'Beatrice',
... "Age": 29
... },
... {
... "Name": 'Ceri',
... "Age": 30
... },
... {
... "Name": 'Dean',
... "Age": 26
... }
... ]
>>> a_dictionary_of_two_dimensional_arrays = {
... 'Sheet 1':
... [
... [1.0, 2.0, 3.0],
... [4.0, 5.0, 6.0],
... [7.0, 8.0, 9.0]
... ],
... 'Sheet 2':
... [
... ['X', 'Y', 'Z'],
... [1.0, 2.0, 3.0],
... [4.0, 5.0, 6.0]
... ],
... 'Sheet 3':
... [
... ['O', 'P', 'Q'],
... [3.0, 2.0, 1.0],
... [4.0, 3.0, 2.0]
... ]
... }
Data manipulations¶
The main operation on a cell involves cell access, formatting and cleansing. The main operation on a sheet involves the group access to a row or a column, data filtering and data transformation. The main operation in a book is obtain access to individual sheets.
Data transcoding¶
For various reasons, the data in one format is to be transcoded into another format. This library provides the transcoding tunnel for data transcoding in between supported file formats.
[1] | quoted from whatis.com. Technical details can be found at MSDN XLS |
[2] | xlsx is used by MS-Excel 2007, more information can be found at MSDN XLSX |
Signature functions¶
Import data into Python¶
This library provides one application programming interface to read data from one of the following data sources:
- physical file
- memory file
- SQLAlchemy table
- Django Model
- Python data structures: dictionary, records and array
and to transform them into one of the data structures:
Four data access functions¶
It is believed that once a Python developer could easily operate on list, dictionary and various mixture of both. This library provides four module level functions to help you obtain excel data in those formats. Please refer to “A list of module level functions”, the first three functions operates on any one sheet from an excel book and the fourth one returns all data in all sheets in an excel book.
Functions | Name | Python name |
---|---|---|
get_array() |
two dimensional array | a list of lists |
get_dict() |
a dictionary of one dimensional arrays | an ordered dictionary of lists |
get_records() |
a list of dictionaries | a list of dictionaries |
get_book_dict() |
a dictionary of two dimensional arrays | a dictionary of lists of lists |
See also:
- How to get an array from an excel sheet
- How to get a dictionary from an excel sheet
- How to obtain records from an excel sheet
- How to obtain a dictionary from a multiple sheet book
The following two variants of the data access function use generator and should work well with big data files
Functions | Name | Python name |
---|---|---|
iget_array() |
|
a generator of a list of lists |
iget_records() |
a memory efficient list list of dictionaries | a generator of a list of dictionaries |
Two native functions¶
In cases where the excel data needs custom manipulations, a pyexcel user got a few choices: one is to use Sheet
and Book
, the other is to look for more sophisticated ones:
- Pandas, for numerical analysis
- Do-it-yourself
Functions | Returns |
---|---|
get_sheet() |
Sheet |
get_book() |
Book |
For all six functions, you can pass on the same command parameters while the return value is what the function says.
Export data from Python¶
This library provides one application programming interface to transform them into one of the data structures:
and write to one of the following data sources:
- physical file
- memory file
- SQLAlchemy table
- Django Model
- Python data structures: dictionary, records and array
Here are the two functions:
Functions | Description |
---|---|
save_as() |
Works well with single sheet file |
isave_as() |
Works well with big data files |
save_book_as() |
|
If you would only use these two functions to do format transcoding, you may enjoy a
speed boost using isave_as()
and save_book_as()
,
because they use yield keyword and minize memory footprint.
save_as()
uses Sheet
, which reads all data into
memory.
See also:
Data transportation/transcoding¶
Based the capability of this library, it is capable of transporting your data in between any of these data sources:
- physical file
- memory file
- SQLAlchemy table
- Django Model
- Python data structures: dictionary, records and array
See also:
Tutorial¶
Work with excel files¶
Warning
The pyexcel DOES NOT consider Fonts, Styles, Formulas and Charts at all. When you load a stylish excel and update it, you definitely will lose all those.
Add a new row to an existing file¶
Suppose you have one data file as the following:
example.xls
Column 1 | Column 2 | Column 3 |
---|---|---|
1 | 4 | 7 |
2 | 5 | 8 |
3 | 6 | 9 |
And you want to add a new row:
12, 11, 10
Here is the code:
>>> import pyexcel as pe
>>> sheet = pe.get_sheet(file_name="example.xls")
>>> sheet.row += [12, 11, 10]
>>> sheet.save_as("new_example.xls")
>>> pe.get_sheet(file_name="new_example.xls")
pyexcel_sheet1:
+----------+----------+----------+
| Column 1 | Column 2 | Column 3 |
+----------+----------+----------+
| 1 | 4 | 7 |
+----------+----------+----------+
| 2 | 5 | 8 |
+----------+----------+----------+
| 3 | 6 | 9 |
+----------+----------+----------+
| 12 | 11 | 10 |
+----------+----------+----------+
Update an existing row to an existing file¶
Suppose you want to update the last row of the example file as:
[‘N/A’, ‘N/A’, ‘N/A’]
Here is the sample code:
.. code-block:: python
>>> import pyexcel as pe
>>> sheet = pe.get_sheet(file_name="example.xls")
>>> sheet.row[3] = ['N/A', 'N/A', 'N/A']
>>> sheet.save_as("new_example1.xls")
>>> pe.get_sheet(file_name="new_example1.xls")
pyexcel_sheet1:
+----------+----------+----------+
| Column 1 | Column 2 | Column 3 |
+----------+----------+----------+
| 1 | 4 | 7 |
+----------+----------+----------+
| 2 | 5 | 8 |
+----------+----------+----------+
| N/A | N/A | N/A |
+----------+----------+----------+
Add a new column to an existing file¶
And you want to add a column instead:
[“Column 4”, 10, 11, 12]
Here is the code:
>>> import pyexcel as pe
>>> sheet = pe.get_sheet(file_name="example.xls")
>>> sheet.column += ["Column 4", 10, 11, 12]
>>> sheet.save_as("new_example2.xls")
>>> pe.get_sheet(file_name="new_example2.xls")
pyexcel_sheet1:
+----------+----------+----------+----------+
| Column 1 | Column 2 | Column 3 | Column 4 |
+----------+----------+----------+----------+
| 1 | 4 | 7 | 10 |
+----------+----------+----------+----------+
| 2 | 5 | 8 | 11 |
+----------+----------+----------+----------+
| 3 | 6 | 9 | 12 |
+----------+----------+----------+----------+
Update an existing column to an existing file¶
Again let’s update “Column 3” with:
[100, 200, 300]
Here is the sample code:
>>> import pyexcel as pe
>>> sheet = pe.get_sheet(file_name="example.xls")
>>> sheet.column[2] = ["Column 3", 100, 200, 300]
>>> sheet.save_as("new_example3.xls")
>>> pe.get_sheet(file_name="new_example3.xls")
pyexcel_sheet1:
+----------+----------+----------+
| Column 1 | Column 2 | Column 3 |
+----------+----------+----------+
| 1 | 4 | 100 |
+----------+----------+----------+
| 2 | 5 | 200 |
+----------+----------+----------+
| 3 | 6 | 300 |
+----------+----------+----------+
Alternatively, you could have done like this:
>>> import pyexcel as pe
>>> sheet = pe.get_sheet(file_name="example.xls", name_columns_by_row=0)
>>> sheet.column["Column 3"] = [100, 200, 300]
>>> sheet.save_as("new_example4.xls")
>>> pe.get_sheet(file_name="new_example4.xls")
pyexcel_sheet1:
+----------+----------+----------+
| Column 1 | Column 2 | Column 3 |
+----------+----------+----------+
| 1 | 4 | 100 |
+----------+----------+----------+
| 2 | 5 | 200 |
+----------+----------+----------+
| 3 | 6 | 300 |
+----------+----------+----------+
How about the same alternative solution to previous row based example? Well, you’d better to have the following kind of data
row_example.xls
Row 1 | 1 | 2 | 3 |
Row 2 | 4 | 5 | 6 |
Row 3 | 7 | 8 | 9 |
And then you want to update “Row 3” with for example:
[100, 200, 300]
These code would do the job:
>>> import pyexcel as pe
>>> sheet = pe.get_sheet(file_name="row_example.xls", name_rows_by_column=0)
>>> sheet.row["Row 3"] = [100, 200, 300]
>>> sheet.save_as("new_example5.xls")
>>> pe.get_sheet(file_name="new_example5.xls")
pyexcel_sheet1:
+-------+-----+-----+-----+
| Row 1 | 1 | 2 | 3 |
+-------+-----+-----+-----+
| Row 2 | 4 | 5 | 6 |
+-------+-----+-----+-----+
| Row 3 | 100 | 200 | 300 |
+-------+-----+-----+-----+
Work with excel files in memory¶
Excel files in memory can be manipulated directly without saving it to physical disk and vice versa. This is useful in excel file handling at file upload or in excel file download. For example:
>>> import pyexcel
>>> content = "1,2,3\n3,4,5"
>>> sheet = pyexcel.get_sheet(file_type="csv", file_content=content)
>>> sheet.csv
'1,2,3\r\n3,4,5\r\n'
file type as its attributes¶
Since version 0.3.0, each supported file types became an attribute of the Sheet and Book class. What it means is that:
- Read the content in memory
- Set the content in memory
For example, after you have your Sheet and Book instance, you could access its content in a support file type by using its dot notation. The code in previous section could be rewritten as:
>>> import pyexcel
>>> content = "1,2,3\n3,4,5"
>>> sheet = pyexcel.Sheet()
>>> sheet.csv = content
>>> sheet.array
[[1, 2, 3], [3, 4, 5]]
Read any supported excel and respond its content in json¶
You can find a real world example in examples/memoryfile/ directory: pyexcel_server.py. Here is the example snippet
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 | def upload():
if request.method == 'POST' and 'excel' in request.files:
# handle file upload
filename = request.files['excel'].filename
extension = filename.split(".")[-1]
# Obtain the file extension and content
# pass a tuple instead of a file name
content = request.files['excel'].read()
if sys.version_info[0] > 2:
# in order to support python 3
# have to decode bytes to str
content = content.decode('utf-8')
sheet = pe.get_sheet(file_type=extension, file_content=content)
# then use it as usual
sheet.name_columns_by_row(0)
# respond with a json
return jsonify({"result": sheet.dict})
return render_template('upload.html')
|
request.files[‘excel’] in line 4 holds the file object. line 5 finds out the file extension. line 13 obtains a sheet instance. line 15 uses the first row as data header. line 17 sends the json representation of the excel file back to client browser.
Write to memory and respond to download¶
1 2 3 4 5 6 7 8 9 10 11 12 | data = [
[...],
...
]
@app.route('/download')
def download():
sheet = pe.Sheet(data)
output = make_response(sheet.csv)
output.headers["Content-Disposition"] = "attachment; filename=export.csv"
output.headers["Content-type"] = "text/csv"
return output
|
make_response is a Flask utility to make a memory content as http response.
Note
You can find the corresponding source code at examples/memoryfile
Relevant packages¶
Readily made plugins have been made on top of this example. Here is a list of them:
framework | plugin/middleware/extension |
---|---|
Flask | Flask-Excel |
Django | django-excel |
Pyramid | pyramid-excel |
And you may make your own by using pyexcel-webio
Sheet: Data conversion¶
How to obtain records from an excel sheet¶
Suppose you want to process the following excel data :
Name | Age |
---|---|
Adam | 28 |
Beatrice | 29 |
Ceri | 30 |
Dean | 26 |
Here are the example code:
>>> import pyexcel as pe
>>> records = pe.get_records(file_name="your_file.xls")
>>> for record in records:
... print("%s is aged at %d" % (record['Name'], record['Age']))
Adam is aged at 28
Beatrice is aged at 29
Ceri is aged at 30
Dean is aged at 26
How to get an array from an excel sheet¶
Suppose you have a csv, xls, xlsx file as the following:
1 | 2 | 3 |
4 | 5 | 6 |
7 | 8 | 9 |
The following code will give you the data in json:
>>> import pyexcel
>>> # "example.csv","example.xlsx","example.xlsm"
>>> my_array = pyexcel.get_array(file_name="example.xls")
>>> my_array
[[1, 2, 3], [4, 5, 6], [7, 8, 9]]
How to save an python array as an excel file¶
Suppose you have the following array:
>>> data = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
And here is the code to save it as an excel file
>>> import pyexcel
>>> pyexcel.save_as(array=data, dest_file_name="example.xls")
Let’s verify it:
>>> pyexcel.get_sheet(file_name="example.xls")
pyexcel_sheet1:
+---+---+---+
| 1 | 2 | 3 |
+---+---+---+
| 4 | 5 | 6 |
+---+---+---+
| 7 | 8 | 9 |
+---+---+---+
How to save an python array as a csv file with special delimiter¶
Suppose you have the following array:
>>> data = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
And here is the code to save it as an excel file
>>> import pyexcel
>>> pyexcel.save_as(array=data,
... dest_file_name="example.csv",
... dest_delimiter=':')
Let’s verify it:
>>> with open("example.csv") as f:
... for line in f.readlines():
... print(line.rstrip())
...
1:2:3
4:5:6
7:8:9
How to get a dictionary from an excel sheet¶
Suppose you have a csv, xls, xlsx file as the following:
Column 1 | Column 2 | Column 3 |
---|---|---|
1 | 4 | 7 |
2 | 5 | 8 |
3 | 6 | 9 |
The following code will give you data series in a dictionary:
>>> import pyexcel
>>> from pyexcel._compact import OrderedDict
>>> my_dict = pyexcel.get_dict(file_name="example_series.xls", name_columns_by_row=0)
>>> isinstance(my_dict, OrderedDict)
True
>>> for key, values in my_dict.items():
... print({str(key): values})
{'Column 1': [1, 4, 7]}
{'Column 2': [2, 5, 8]}
{'Column 3': [3, 6, 9]}
Please note that my_dict is an OrderedDict.
How to obtain a dictionary from a multiple sheet book¶
Suppose you have a multiple sheet book as the following:
1 | 2 | 3 |
4 | 5 | 6 |
7 | 8 | 9 |
X | Y | Z |
1 | 2 | 3 |
4 | 5 | 6 |
O | P | Q |
3 | 2 | 1 |
4 | 3 | 2 |
Here is the code to obtain those sheets as a single dictionary:
>>> import pyexcel
>>> import json
>>> book_dict = pyexcel.get_book_dict(file_name="book.xls")
>>> isinstance(book_dict, OrderedDict)
True
>>> for key, item in book_dict.items():
... print(json.dumps({key: item}))
{"Sheet 1": [[1, 2, 3], [4, 5, 6], [7, 8, 9]]}
{"Sheet 2": [["X", "Y", "Z"], [1, 2, 3], [4, 5, 6]]}
{"Sheet 3": [["O", "P", "Q"], [3, 2, 1], [4, 3, 2]]}
How to save a dictionary of two dimensional array as an excel file¶
Suppose you want to save the below dictionary to an excel file
>>> a_dictionary_of_two_dimensional_arrays = {
... 'Sheet 1':
... [
... [1.0, 2.0, 3.0],
... [4.0, 5.0, 6.0],
... [7.0, 8.0, 9.0]
... ],
... 'Sheet 2':
... [
... ['X', 'Y', 'Z'],
... [1.0, 2.0, 3.0],
... [4.0, 5.0, 6.0]
... ],
... 'Sheet 3':
... [
... ['O', 'P', 'Q'],
... [3.0, 2.0, 1.0],
... [4.0, 3.0, 2.0]
... ]
... }
Here is the code:
>>> pyexcel.save_book_as(
... bookdict=a_dictionary_of_two_dimensional_arrays,
... dest_file_name="book.xls"
... )
If you want to preserve the order of sheets in your dictionary, you have to pass on an ordered dictionary to the function itself. For example:
>>> data = OrderedDict()
>>> data.update({"Sheet 2": a_dictionary_of_two_dimensional_arrays['Sheet 2']})
>>> data.update({"Sheet 1": a_dictionary_of_two_dimensional_arrays['Sheet 1']})
>>> data.update({"Sheet 3": a_dictionary_of_two_dimensional_arrays['Sheet 3']})
>>> pyexcel.save_book_as(bookdict=data, dest_file_name="book.xls")
Let’s verify its order:
>>> book_dict = pyexcel.get_book_dict(file_name="book.xls")
>>> for key, item in book_dict.items():
... print(json.dumps({key: item}))
{"Sheet 2": [["X", "Y", "Z"], [1, 2, 3], [4, 5, 6]]}
{"Sheet 1": [[1, 2, 3], [4, 5, 6], [7, 8, 9]]}
{"Sheet 3": [["O", "P", "Q"], [3, 2, 1], [4, 3, 2]]}
Please notice that “Sheet 2” is the first item in the book_dict, meaning the order of sheets are preserved.
How to an excel sheet to a database using SQLAlchemy¶
Note
You can find the complete code of this example in examples folder on github
Before going ahead, let’s import the needed components and initialize sql engine and table base:
>>> from sqlalchemy import create_engine
>>> from sqlalchemy.ext.declarative import declarative_base
>>> from sqlalchemy import Column , Integer, String, Float, Date
>>> from sqlalchemy.orm import sessionmaker
>>> engine = create_engine("sqlite:///birth.db")
>>> Base = declarative_base()
>>> Session = sessionmaker(bind=engine)
Let’s suppose we have the following database model:
>>> class BirthRegister(Base):
... __tablename__='birth'
... id=Column(Integer, primary_key=True)
... name=Column(String)
... weight=Column(Float)
... birth=Column(Date)
Let’s create the table:
>>> Base.metadata.create_all(engine)
Now here is a sample excel file to be saved to the table:
name | weight | birth |
---|---|---|
Adam | 3.4 | 2015-02-03 |
Smith | 4.2 | 2014-11-12 |
Here is the code to import it:
>>> session = Session() # obtain a sql session
>>> pyexcel.save_as(file_name="birth.xls", name_columns_by_row=0, dest_session=session, dest_table=BirthRegister)
Done it. It is that simple. Let’s verify what has been imported to make sure.
>>> sheet = pyexcel.get_sheet(session=session, table=BirthRegister)
>>> sheet
birth:
+------------+----+-------+--------+
| birth | id | name | weight |
+------------+----+-------+--------+
| 2015-02-03 | 1 | Adam | 3.4 |
+------------+----+-------+--------+
| 2014-11-12 | 2 | Smith | 4.2 |
+------------+----+-------+--------+
How to open an xls file and save it as csv¶
Suppose we want to save previous used example ‘birth.xls’ as a csv file
>>> import pyexcel
>>> pyexcel.save_as(file_name="birth.xls", dest_file_name="birth.csv")
Again it is really simple. Let’s verify what we have gotten:
>>> sheet = pyexcel.get_sheet(file_name="birth.csv")
>>> sheet
birth.csv:
+-------+--------+----------+
| name | weight | birth |
+-------+--------+----------+
| Adam | 3.4 | 03/02/15 |
+-------+--------+----------+
| Smith | 4.2 | 12/11/14 |
+-------+--------+----------+
Note
Please note that csv(comma separate value) file is pure text file. Formula, charts, images and formatting in xls file will disappear no matter which transcoding tool you use. Hence, pyexcel is a quick alternative for this transcoding job.
How to open an xls file and save it as xlsx¶
Warning
Formula, charts, images and formatting in xls file will disappear as pyexcel does not support Formula, charts, images and formatting.
Let use previous example and save it as ods instead
>>> import pyexcel
>>> pyexcel.save_as(file_name="birth.xls",
... dest_file_name="birth.xlsx") # change the file extension
Again let’s verify what we have gotten:
>>> sheet = pyexcel.get_sheet(file_name="birth.xlsx")
>>> sheet
pyexcel_sheet1:
+-------+--------+----------+
| name | weight | birth |
+-------+--------+----------+
| Adam | 3.4 | 03/02/15 |
+-------+--------+----------+
| Smith | 4.2 | 12/11/14 |
+-------+--------+----------+
How to open a xls multiple sheet excel book and save it as csv¶
Well, you write similar codes as before but you will need to use save_book_as()
function.
Dot notation for data source¶
Since version 0.3.0, the data source becomes an attribute of the pyexcel native classes. All support data format is a dot notation away.
For sheet¶
Get content¶
>>> import pyexcel
>>> content = "1,2,3\n3,4,5"
>>> sheet = pyexcel.get_sheet(file_type="csv", file_content=content)
>>> sheet.tsv
'1\t2\t3\r\n3\t4\t5\r\n'
>>> print(sheet.simple)
csv:
- - -
1 2 3
3 4 5
- - -
What’s more, you could as well set value to an attribute, for example:
>>> import pyexcel
>>> content = "1,2,3\n3,4,5"
>>> sheet = pyexcel.Sheet()
>>> sheet.csv = content
>>> sheet.array
[[1, 2, 3], [3, 4, 5]]
You can get the direct access to underneath stream object. In some situation, it is desired.
>>> stream = sheet.stream.tsv
The returned stream object has tsv formatted content for reading.
Set content¶
What you could further do is to set a memory stream of any supported file format to a sheet. For example:
>>> another_sheet = pyexcel.Sheet()
>>> another_sheet.xls = sheet.xls
>>> another_sheet.content
+---+---+---+
| 1 | 2 | 3 |
+---+---+---+
| 3 | 4 | 5 |
+---+---+---+
- Yet, it is possible assign a absolute url to an online excel file to an instance of
pyexcel.Sheet
.
>>> another_sheet.url = "https://github.com/pyexcel/pyexcel/raw/master/examples/basics/multiple-sheets-example.xls"
>>> another_sheet.content
+---+---+---+
| 1 | 2 | 3 |
+---+---+---+
| 4 | 5 | 6 |
+---+---+---+
| 7 | 8 | 9 |
+---+---+---+
For book¶
The same dot notation is avaiable to pyexcel.Book
as well.
Get content¶
>>> book_dict = {
... 'Sheet 2':
... [
... ['X', 'Y', 'Z'],
... [1.0, 2.0, 3.0],
... [4.0, 5.0, 6.0]
... ],
... 'Sheet 3':
... [
... ['O', 'P', 'Q'],
... [3.0, 2.0, 1.0],
... [4.0, 3.0, 2.0]
... ],
... 'Sheet 1':
... [
... [1.0, 2.0, 3.0],
... [4.0, 5.0, 6.0],
... [7.0, 8.0, 9.0]
... ]
... }
>>> book = pyexcel.get_book(bookdict=book_dict)
>>> book
Sheet 1:
+-----+-----+-----+
| 1.0 | 2.0 | 3.0 |
+-----+-----+-----+
| 4.0 | 5.0 | 6.0 |
+-----+-----+-----+
| 7.0 | 8.0 | 9.0 |
+-----+-----+-----+
Sheet 2:
+-----+-----+-----+
| X | Y | Z |
+-----+-----+-----+
| 1.0 | 2.0 | 3.0 |
+-----+-----+-----+
| 4.0 | 5.0 | 6.0 |
+-----+-----+-----+
Sheet 3:
+-----+-----+-----+
| O | P | Q |
+-----+-----+-----+
| 3.0 | 2.0 | 1.0 |
+-----+-----+-----+
| 4.0 | 3.0 | 2.0 |
+-----+-----+-----+
>>> print(book.rst)
Sheet 1:
= = =
1 2 3
4 5 6
7 8 9
= = =
Sheet 2:
=== === ===
X Y Z
1.0 2.0 3.0
4.0 5.0 6.0
=== === ===
Sheet 3:
=== === ===
O P Q
3.0 2.0 1.0
4.0 3.0 2.0
=== === ===
You can get the direct access to underneath stream object. In some situation, it is desired.
>>> stream = sheet.stream.plain
The returned stream object has the content formatted in plain format for further reading.
Set content¶
Surely, you could set content to an instance of pyexcel.Book
.
>>> other_book = pyexcel.Book()
>>> other_book.bookdict = book_dict
>>> print(other_book.plain)
Sheet 1:
1 2 3
4 5 6
7 8 9
Sheet 2:
X Y Z
1.0 2.0 3.0
4.0 5.0 6.0
Sheet 3:
O P Q
3.0 2.0 1.0
4.0 3.0 2.0
You can set via ‘xls’ attribute too.
>>> another_book = pyexcel.Book()
>>> another_book.xls = other_book.xls
>>> print(another_book.mediawiki)
Sheet 1:
{| class="wikitable" style="text-align: left;"
|+ <!-- caption -->
|-
| align="right"| 1 || align="right"| 2 || align="right"| 3
|-
| align="right"| 4 || align="right"| 5 || align="right"| 6
|-
| align="right"| 7 || align="right"| 8 || align="right"| 9
|}
Sheet 2:
{| class="wikitable" style="text-align: left;"
|+ <!-- caption -->
|-
| X || Y || Z
|-
| 1 || 2 || 3
|-
| 4 || 5 || 6
|}
Sheet 3:
{| class="wikitable" style="text-align: left;"
|+ <!-- caption -->
|-
| O || P || Q
|-
| 3 || 2 || 1
|-
| 4 || 3 || 2
|}
How about setting content via a url?
>>> another_book.url = "https://github.com/pyexcel/pyexcel/raw/master/examples/basics/multiple-sheets-example.xls"
>>> another_book
Sheet 1:
+---+---+---+
| 1 | 2 | 3 |
+---+---+---+
| 4 | 5 | 6 |
+---+---+---+
| 7 | 8 | 9 |
+---+---+---+
Sheet 2:
+---+---+---+
| X | Y | Z |
+---+---+---+
| 1 | 2 | 3 |
+---+---+---+
| 4 | 5 | 6 |
+---+---+---+
Sheet 3:
+---+---+---+
| O | P | Q |
+---+---+---+
| 3 | 2 | 1 |
+---+---+---+
| 4 | 3 | 2 |
+---+---+---+
Getters and Setters¶
You can pass on source specific parameters to getter and setter functions.
>>> content = "1-2-3\n3-4-5"
>>> sheet = pyexcel.Sheet()
>>> sheet.set_csv(content, delimiter="-")
>>> sheet.csv
'1,2,3\r\n3,4,5\r\n'
>>> sheet.get_csv(delimiter="|")
'1|2|3\r\n3|4|5\r\n'
Work with big data sheet¶
Pagination¶
When you are dealing with huge amount of data, e.g. 64GB, obviously you would not like to fill up your memory with those data. Hence pagnation feature is developed to read partial data into memory for processing. You can pagninate by row, by column and by both.
Let’s assume the following file is a huge csv file:
>>> import datetime
>>> import pyexcel as pe
>>> data = [
... [1, 21, 31],
... [2, 22, 32],
... [3, 23, 33],
... [4, 24, 34],
... [5, 25, 35],
... [6, 26, 36]
... ]
>>> pe.save_as(array=data, dest_file_name="your_file.csv")
And let’s pretend to read partial data:
>>> pe.get_sheet(file_name="your_file.csv", start_row=2, row_limit=3)
your_file.csv:
+---+----+----+
| 3 | 23 | 33 |
+---+----+----+
| 4 | 24 | 34 |
+---+----+----+
| 5 | 25 | 35 |
+---+----+----+
And you could as well do the same for columns:
>>> pe.get_sheet(file_name="your_file.csv", start_column=1, column_limit=2)
your_file.csv:
+----+----+
| 21 | 31 |
+----+----+
| 22 | 32 |
+----+----+
| 23 | 33 |
+----+----+
| 24 | 34 |
+----+----+
| 25 | 35 |
+----+----+
| 26 | 36 |
+----+----+
Obvious, you could do both at the same time:
>>> pe.get_sheet(file_name="your_file.csv",
... start_row=2, row_limit=3,
... start_column=1, column_limit=2)
your_file.csv:
+----+----+
| 23 | 33 |
+----+----+
| 24 | 34 |
+----+----+
| 25 | 35 |
+----+----+
The pagination support is available across all pyexcel plugins.
Note
No column pagination support for query sets as data source.
Formatting while transcoding a big data file¶
If you are transcoding a big data set, conventional formatting method would not help unless a on-demand free RAM is available. However, there is a way to minize the memory footprint of pyexcel while the formatting is performed.
Let’s continue from previous example. Suppose we want to transcode “your_file.csv” to “your_file.xls” but increase each element by 1.
What we can do is to define a row renderer function as the following:
>>> def increment_by_one(row):
... for element in row:
... yield element + 1
Then pass it onto save_as function using row_renderer:
>>> pe.isave_as(file_name="your_file.csv",
... row_renderer=increment_by_one,
... dest_file_name="your_file.xlsx")
Note
If the data content is from a generator, isave_as has to be used.
We can verify if it was done correctly:
>>> pe.get_sheet(file_name="your_file.xlsx")
your_file.csv:
+---+----+----+
| 2 | 22 | 32 |
+---+----+----+
| 3 | 23 | 33 |
+---+----+----+
| 4 | 24 | 34 |
+---+----+----+
| 5 | 25 | 35 |
+---+----+----+
| 6 | 26 | 36 |
+---+----+----+
| 7 | 27 | 37 |
+---+----+----+
Sheet: Data Access¶
Random access to individual cell¶
To randomly access a cell of Sheet
instance, two syntax are available:
sheet[row, column]
or:
sheet['A1']
The former syntax is handy when you know the row and column numbers. The latter syntax is introduced to help you convert the excel column header such as “AX” to integer numbers.
Suppose you have the following data, you can get value 5 by reader[2, 2].
Example | X | Y | Z |
a | 1 | 2 | 3 |
b | 4 | 5 | 6 |
c | 7 | 8 | 9 |
Here is the example code showing how you can randomly access a cell:
>>> import pyexcel
>>> sheet = pyexcel.get_sheet(file_name="example.xls")
>>> sheet.content
+---------+---+---+---+
| Example | X | Y | Z |
+---------+---+---+---+
| a | 1 | 2 | 3 |
+---------+---+---+---+
| b | 4 | 5 | 6 |
+---------+---+---+---+
| c | 7 | 8 | 9 |
+---------+---+---+---+
>>> print(sheet[2, 2])
5
>>> print(sheet["C3"])
5
>>> sheet[3, 3] = 10
>>> print(sheet[3, 3])
10
Note
In order to set a value to a cell, please use sheet[row_index, column_index] = new_value
Random access to rows and columns¶
Continue with previous excel file, you can access row and column separately:
>>> sheet.row[1]
['a', 1, 2, 3]
>>> sheet.column[2]
['Y', 2, 5, 8]
Use custom names instead of index¶
Alternatively, it is possible to use the first row to refer to each columns:
>>> sheet.name_columns_by_row(0)
>>> print(sheet[1, "Y"])
5
>>> sheet[1, "Y"] = 100
>>> print(sheet[1, "Y"])
100
You have noticed the row index has been changed. It is because first row is taken as the column names, hence all rows after the first row are shifted. Now accessing the columns are changed too:
>>> sheet.column['Y']
[2, 100, 8]
Hence access the same cell, this statement also works:
>>> sheet.column['Y'][1]
100
Further more, it is possible to use first column to refer to each rows:
>>> sheet.name_rows_by_column(0)
To access the same cell, we can use this line:
>>> sheet.row["b"][1]
100
For the same reason, the row index has been reduced by 1. Since we have named columns and rows, it is possible to access the same cell like this:
>>> print(sheet["b", "Y"])
100
>>> sheet["b", "Y"] = 200
>>> print(sheet["b", "Y"])
200
Note
When you have named your rows and columns, in order to set a value to a cell, please use sheet[row_name, column_name] = new_value
For multiple sheet file, you can regard it as three dimensional array if you use Book
. So, you access each cell via this syntax:
book[sheet_index][row, column]
or:
book["sheet_name"][row, column]
Suppose you have the following sheets:
1 | 2 | 3 |
4 | 5 | 6 |
7 | 8 | 9 |
X | Y | Z |
1 | 2 | 3 |
4 | 5 | 6 |
O | P | Q |
3 | 2 | 1 |
4 | 3 | 2 |
And you can randomly access a cell in a sheet:
>>> book = pyexcel.get_book(file_name="example.xls")
>>> print(book["Sheet 1"][0,0])
1
>>> print(book[0][0,0]) # the same cell
1
Tip
With pyexcel, you can regard single sheet reader as an two dimensional array and multi-sheet excel book reader as a ordered dictionary of two dimensional arrays.
Reading a single sheet excel file¶
Suppose you have a csv, xls, xlsx file as the following:
1 | 2 | 3 |
4 | 5 | 6 |
7 | 8 | 9 |
The following code will give you the data in json:
>>> import json
>>> # "example.csv","example.xlsx","example.xlsm"
>>> sheet = pyexcel.get_sheet(file_name="example.xls")
>>> print(json.dumps(sheet.to_array()))
[[1, 2, 3], [4, 5, 6], [7, 8, 9]]
Read the sheet as a dictionary¶
Suppose you have a csv, xls, xlsx file as the following:
Column 1 | Column 2 | Column 3 |
---|---|---|
1 | 4 | 7 |
2 | 5 | 8 |
3 | 6 | 9 |
The following code will give you data series in a dictionary:
>>> # "example.xls","example.xlsx","example.xlsm"
>>> sheet = pyexcel.get_sheet(file_name="example_series.xls", name_columns_by_row=0)
>>> sheet.to_dict()
OrderedDict([('Column 1', [1, 4, 7]), ('Column 2', [2, 5, 8]), ('Column 3', [3, 6, 9])])
Can I get an array of dictionaries per each row?¶
Suppose you have the following data:
X | Y | Z |
1 | 2 | 3 |
4 | 5 | 6 |
7 | 8 | 9 |
The following code will produce what you want:
>>> # "example.csv","example.xlsx","example.xlsm"
>>> sheet = pyexcel.get_sheet(file_name="example.xls", name_columns_by_row=0)
>>> records = sheet.to_records()
>>> for record in records:
... keys = sorted(record.keys())
... print("{")
... for key in keys:
... print("'%s':%d" % (key, record[key]))
... print("}")
{
'X':1
'Y':2
'Z':3
}
{
'X':4
'Y':5
'Z':6
}
{
'X':7
'Y':8
'Z':9
}
>>> print(records[0]["X"]) # access first row and first item
1
Writing a single sheet excel file¶
Suppose you have an array as the following:
1 | 2 | 3 |
4 | 5 | 6 |
7 | 8 | 9 |
The following code will write it as an excel file of your choice:
.. testcode::
>>> array = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
>>> # "output.xls" "output.xlsx" "output.ods" "output.xlsm"
>>> sheet = pyexcel.Sheet(array)
>>> sheet.save_as("output.csv")
Suppose you have a dictionary as the following:
Column 1 | Column 2 | Column 3 |
---|---|---|
1 | 4 | 7 |
2 | 5 | 8 |
3 | 6 | 9 |
The following code will write it as an excel file of your choice:
>>> example_dict = {"Column 1": [1, 2, 3], "Column 2": [4, 5, 6], "Column 3": [7, 8, 9]}
>>> # "output.xls" "output.xlsx" "output.ods" "output.xlsm"
>>> sheet = pyexcel.get_sheet(adict=example_dict)
>>> sheet.save_as("output.csv")
Write multiple sheet excel file¶
Suppose you have previous data as a dictionary and you want to save it as multiple sheet excel file:
>>> content = {
... 'Sheet 1':
... [
... [1.0, 2.0, 3.0],
... [4.0, 5.0, 6.0],
... [7.0, 8.0, 9.0]
... ],
... 'Sheet 2':
... [
... ['X', 'Y', 'Z'],
... [1.0, 2.0, 3.0],
... [4.0, 5.0, 6.0]
... ],
... 'Sheet 3':
... [
... ['O', 'P', 'Q'],
... [3.0, 2.0, 1.0],
... [4.0, 3.0, 2.0]
... ]
... }
>>> book = pyexcel.get_book(bookdict=content)
>>> book.save_as("output.xls")
You shall get a xls file
Read multiple sheet excel file¶
Let’s read the previous file back:
>>> book = pyexcel.get_book(file_name="output.xls")
>>> sheets = book.to_dict()
>>> for name in sheets.keys():
... print(name)
Sheet 1
Sheet 2
Sheet 3
Work with data series in a single sheet¶
Suppose you have the following data in any of the supported excel formats again:
Column 1 | Column 2 | Column 3 |
---|---|---|
1 | 4 | 7 |
2 | 5 | 8 |
3 | 6 | 9 |
>>> sheet = pyexcel.get_sheet(file_name="example_series.xls", name_columns_by_row=0)
Play with data¶
You can get headers:
>>> print(list(sheet.colnames))
['Column 1', 'Column 2', 'Column 3']
You can use a utility function to get all in a dictionary:
>>> sheet.to_dict()
OrderedDict([('Column 1', [1, 4, 7]), ('Column 2', [2, 5, 8]), ('Column 3', [3, 6, 9])])
Maybe you want to get only the data without the column headers. You can call rows()
instead:
>>> list(sheet.rows())
[[1, 2, 3], [4, 5, 6], [7, 8, 9]]
You can get data from the bottom to the top one by calling rrows()
instead:
>>> list(sheet.rrows())
[[7, 8, 9], [4, 5, 6], [1, 2, 3]]
You might want the data arranged vertically. You can call columns()
instead:
>>> list(sheet.columns())
[[1, 4, 7], [2, 5, 8], [3, 6, 9]]
You can get columns in reverse sequence as well by calling rcolumns()
instead:
>>> list(sheet.rcolumns())
[[3, 6, 9], [2, 5, 8], [1, 4, 7]]
Do you want to flatten the data? You can get the content in one dimensional array. If you are interested in playing with one dimensional enumeration, you can check out these functions enumerate()
, reverse()
, vertical()
, and rvertical()
:
>>> list(sheet.enumerate())
[1, 2, 3, 4, 5, 6, 7, 8, 9]
>>> list(sheet.reverse())
[9, 8, 7, 6, 5, 4, 3, 2, 1]
>>> list(sheet.vertical())
[1, 4, 7, 2, 5, 8, 3, 6, 9]
>>> list(sheet.rvertical())
[9, 6, 3, 8, 5, 2, 7, 4, 1]
Sheet: Data manipulation¶
The data in a sheet is represented by Sheet
which maintains the data
as a list of lists. You can regard Sheet
as a two dimensional array
with additional iterators. Random access to individual column and row is exposed
by Column
and Row
Column manipulation¶
Suppose have one data file as the following:
>>> sheet = pyexcel.get_sheet(file_name="example.xls", name_columns_by_row=0)
>>> sheet
pyexcel sheet:
+----------+----------+----------+
| Column 1 | Column 2 | Column 3 |
+==========+==========+==========+
| 1 | 4 | 7 |
+----------+----------+----------+
| 2 | 5 | 8 |
+----------+----------+----------+
| 3 | 6 | 9 |
+----------+----------+----------+
And you want to update Column 2
with these data: [11, 12, 13]
>>> sheet.column["Column 2"] = [11, 12, 13]
>>> sheet.column[1]
[11, 12, 13]
>>> sheet
pyexcel sheet:
+----------+----------+----------+
| Column 1 | Column 2 | Column 3 |
+==========+==========+==========+
| 1 | 11 | 7 |
+----------+----------+----------+
| 2 | 12 | 8 |
+----------+----------+----------+
| 3 | 13 | 9 |
+----------+----------+----------+
Remove one column of a data file¶
If you want to remove Column 2
, you can just call:
>>> del sheet.column["Column 2"]
>>> sheet.column["Column 3"]
[7, 8, 9]
The sheet content will become:
>>> sheet
pyexcel sheet:
+----------+----------+
| Column 1 | Column 3 |
+==========+==========+
| 1 | 7 |
+----------+----------+
| 2 | 8 |
+----------+----------+
| 3 | 9 |
+----------+----------+
Append more columns to a data file¶
Continue from previous example. Suppose you want add two more columns to the data file
Column 4 | Column 5 |
---|---|
10 | 13 |
11 | 14 |
12 | 15 |
Here is the example code to append two extra columns:
>>> extra_data = [
... ["Column 4", "Column 5"],
... [10, 13],
... [11, 14],
... [12, 15]
... ]
>>> sheet2 = pyexcel.Sheet(extra_data)
>>> sheet.column += sheet2
>>> sheet.column["Column 4"]
[10, 11, 12]
>>> sheet.column["Column 5"]
[13, 14, 15]
Here is what you will get:
>>> sheet
pyexcel sheet:
+----------+----------+----------+----------+
| Column 1 | Column 3 | Column 4 | Column 5 |
+==========+==========+==========+==========+
| 1 | 7 | 10 | 13 |
+----------+----------+----------+----------+
| 2 | 8 | 11 | 14 |
+----------+----------+----------+----------+
| 3 | 9 | 12 | 15 |
+----------+----------+----------+----------+
Cherry pick some columns to be removed¶
Suppose you have the following data:
>>> data = [
... ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h'],
... [1,2,3,4,5,6,7,9],
... ]
>>> sheet = pyexcel.Sheet(data, name_columns_by_row=0)
>>> sheet
pyexcel sheet:
+---+---+---+---+---+---+---+---+
| a | b | c | d | e | f | g | h |
+===+===+===+===+===+===+===+===+
| 1 | 2 | 3 | 4 | 5 | 6 | 7 | 9 |
+---+---+---+---+---+---+---+---+
And you want to remove columns named as: ‘a’, ‘c, ‘e’, ‘h’. This is how you do it:
>>> del sheet.column['a', 'c', 'e', 'h']
>>> sheet
pyexcel sheet:
+---+---+---+---+
| b | d | f | g |
+===+===+===+===+
| 2 | 4 | 6 | 7 |
+---+---+---+---+
What if the headers are in a different row¶
Suppose you have the following data:
>>> sheet
pyexcel sheet:
+----------+----------+----------+
| 1 | 2 | 3 |
+----------+----------+----------+
| Column 1 | Column 2 | Column 3 |
+----------+----------+----------+
| 4 | 5 | 6 |
+----------+----------+----------+
The way to name your columns is to use index 1:
>>> sheet.name_columns_by_row(1)
Here is what you get:
>>> sheet
pyexcel sheet:
+----------+----------+----------+
| Column 1 | Column 2 | Column 3 |
+==========+==========+==========+
| 1 | 2 | 3 |
+----------+----------+----------+
| 4 | 5 | 6 |
+----------+----------+----------+
Row manipulation¶
Suppose you have the following data:
>>> sheet
pyexcel sheet:
+---+---+---+-------+
| a | b | c | Row 1 |
+---+---+---+-------+
| e | f | g | Row 2 |
+---+---+---+-------+
| 1 | 2 | 3 | Row 3 |
+---+---+---+-------+
You can name your rows by column index at 3:
>>> sheet.name_rows_by_column(3)
>>> sheet
pyexcel sheet:
+-------+---+---+---+
| Row 1 | a | b | c |
+-------+---+---+---+
| Row 2 | e | f | g |
+-------+---+---+---+
| Row 3 | 1 | 2 | 3 |
+-------+---+---+---+
Then you can access rows by its name:
>>> sheet.row["Row 1"]
['a', 'b', 'c']
Sheet: Data filtering¶
use filter()
function to apply a filter immediately. The content is modified.
Suppose you have the following data in any of the supported excel formats:
Column 1 | Column 2 | Column 3 |
---|---|---|
1 | 4 | 7 |
2 | 5 | 8 |
3 | 6 | 9 |
>>> import pyexcel
>>> sheet = pyexcel.get_sheet(file_name="example_series.xls", name_columns_by_row=0)
>>> sheet.content
+----------+----------+----------+
| Column 1 | Column 2 | Column 3 |
+==========+==========+==========+
| 1 | 2 | 3 |
+----------+----------+----------+
| 4 | 5 | 6 |
+----------+----------+----------+
| 7 | 8 | 9 |
+----------+----------+----------+
Filter out some data¶
You may want to filter odd rows and print them in an array of dictionaries:
>>> sheet.filter(row_indices=[0, 2])
>>> sheet.content
+----------+----------+----------+
| Column 1 | Column 2 | Column 3 |
+==========+==========+==========+
| 4 | 5 | 6 |
+----------+----------+----------+
Let’s try to further filter out even columns:
>>> sheet.filter(column_indices=[1])
>>> sheet.content
+----------+----------+
| Column 1 | Column 3 |
+==========+==========+
| 4 | 6 |
+----------+----------+
Save the data¶
Let’s save the previous filtered data:
>>> sheet.save_as("example_series_filter.xls")
When you open example_series_filter.xls, you will find these data
Column 1 | Column 3 |
---|---|
2 | 8 |
How to filter out empty rows in my sheet?¶
Suppose you have the following data in a sheet and you want to remove those rows with blanks:
>>> import pyexcel as pe
>>> sheet = pe.Sheet([[1,2,3],['','',''],['','',''],[1,2,3]])
You can use pyexcel.filters.RowValueFilter
, which examines each row, return True if the row should be filtered out. So, let’s define a filter function:
>>> def filter_row(row_index, row):
... result = [element for element in row if element != '']
... return len(result)==0
And then apply the filter on the sheet:
>>> del sheet.row[filter_row]
>>> sheet
pyexcel sheet:
+---+---+---+
| 1 | 2 | 3 |
+---+---+---+
| 1 | 2 | 3 |
+---+---+---+
Sheet: Formatting¶
Previous section has assumed the data is in the format that you want. In reality, you have to
manipulate the data types a bit to suit your needs. Hence, formatters comes into the scene.
use format()
to apply formatter immediately.
Note
int, float andate datetime values are automatically detected in csv files since pyexcel version 0.2.2
Convert a column of numbers to strings¶
Suppose you have the following data:
>>> import pyexcel
>>> data = [
... ["userid","name"],
... [10120,"Adam"],
... [10121,"Bella"],
... [10122,"Cedar"]
... ]
>>> sheet = pyexcel.Sheet(data)
>>> sheet.name_columns_by_row(0)
>>> sheet.column["userid"]
[10120, 10121, 10122]
As you can see, userid column is of int type. Next, let’s convert the column to string format:
>>> sheet.column.format("userid", str)
>>> sheet.column["userid"]
['10120', '10121', '10122']
Cleanse the cells in a spread sheet¶
Sometimes, the data in a spreadsheet may have unwanted strings in all or some cells. Let’s take an example. Suppose we have a spread sheet that contains all strings but it as random spaces before and after the text values. Some field had weird characters, such as “ ”:
>>> data = [
... [" Version", " Comments", " Author "],
... [" v0.0.1 ", " Release versions"," Eda"],
... [" v0.0.2 ", "Useful updates ", " Freud"]
... ]
>>> sheet = pyexcel.Sheet(data)
>>> sheet.content
+-----------------+------------------------------+----------------------+
| Version | Comments | Author |
+-----------------+------------------------------+----------------------+
| v0.0.1 | Release versions | Eda |
+-----------------+------------------------------+----------------------+
| v0.0.2 | Useful updates | Freud |
+-----------------+------------------------------+----------------------+
Now try to create a custom cleanse function:
.. code-block:: python
>>> def cleanse_func(v):
... v = v.replace(" ", "")
... v = v.rstrip().strip()
... return v
...
Then let’s create a SheetFormatter
and apply it:
.. code-block:: python
>>> sheet.map(cleanse_func)
So in the end, you get this:
>>> sheet.content
+---------+------------------+--------+
| Version | Comments | Author |
+---------+------------------+--------+
| v0.0.1 | Release versions | Eda |
+---------+------------------+--------+
| v0.0.2 | Useful updates | Freud |
+---------+------------------+--------+
Book: Sheet operations¶
Access to individual sheets¶
You can access individual sheet of a book via attribute:
>>> book = pyexcel.get_book(file_name="book.xls")
>>> book.sheet3
sheet3:
+---+---+---+
| 1 | 2 | 3 |
+---+---+---+
| 4 | 5 | 6 |
+---+---+---+
| 7 | 8 | 9 |
+---+---+---+
or via array notations:
>>> book["sheet 1"] # there is a space in the sheet name
sheet 1:
+---+---+---+
| 1 | 2 | 3 |
+---+---+---+
| 1 | 2 | 3 |
+---+---+---+
| 4 | 5 | 6 |
+---+---+---+
Merge excel books¶
Suppose you have two excel books and each had three sheets. You can merge them and get a new book:
You also can merge indivdual sheets:
>>> book1 = pyexcel.get_book(file_name="book1.xls")
>>> book2 = pyexcel.get_book(file_name="book2.xlsx")
>>> merged_book = book1 + book2
>>> merged_book = book1["Sheet 1"] + book2["Sheet 2"]
>>> merged_book = book1["Sheet 1"] + book2
>>> merged_book = book1 + book2["Sheet 2"]
Manipulate individual sheets¶
merge sheets into a single sheet¶
Suppose you want to merge many csv files row by row into a new sheet.
>>> import glob
>>> merged = pyexcel.Sheet()
>>> for file in glob.glob("*.csv"):
... merged.row += pyexcel.get_sheet(file_name=file)
>>> merged.save_as("merged.csv")
How do I read a book, process it and save to a new book¶
Yes, you can do that. The code looks like this:
import pyexcel
book = pyexcel.get_book(file_name="yourfile.xls")
for sheet in book:
# do you processing with sheet
# do filtering?
pass
book.save_as("output.xls")
What would happen if I save a multi sheet book into “csv” file¶
Well, you will get one csv file per each sheet. Suppose you have these code:
>>> content = {
... 'Sheet 1':
... [
... [1.0, 2.0, 3.0],
... [4.0, 5.0, 6.0],
... [7.0, 8.0, 9.0]
... ],
... 'Sheet 2':
... [
... ['X', 'Y', 'Z'],
... [1.0, 2.0, 3.0],
... [4.0, 5.0, 6.0]
... ],
... 'Sheet 3':
... [
... ['O', 'P', 'Q'],
... [3.0, 2.0, 1.0],
... [4.0, 3.0, 2.0]
... ]
... }
>>> book = pyexcel.Book(content)
>>> book.save_as("myfile.csv")
You will end up with three csv files:
>>> import glob
>>> outputfiles = glob.glob("myfile_*.csv")
>>> for file in sorted(outputfiles):
... print(file)
...
myfile__Sheet 1__0.csv
myfile__Sheet 2__1.csv
myfile__Sheet 3__2.csv
and their content is the value of the dictionary at the corresponding key
Alternatively, you could use save_book_as()
function
>>> pyexcel.save_book_as(bookdict=content, dest_file_name="myfile.csv")
After I have saved my multiple sheet book in csv format, how do I get them back¶
First of all, you can read them back individual as csv file using meth:~pyexcel.get_sheet method. Secondly, the pyexcel can do the magic to load all of them back into a book. You will just need to provide the common name before the separator “__”:
>>> book2 = pyexcel.get_book(file_name="myfile.csv")
>>> book2
Sheet 1:
+-----+-----+-----+
| 1.0 | 2.0 | 3.0 |
+-----+-----+-----+
| 4.0 | 5.0 | 6.0 |
+-----+-----+-----+
| 7.0 | 8.0 | 9.0 |
+-----+-----+-----+
Sheet 2:
+-----+-----+-----+
| X | Y | Z |
+-----+-----+-----+
| 1.0 | 2.0 | 3.0 |
+-----+-----+-----+
| 4.0 | 5.0 | 6.0 |
+-----+-----+-----+
Sheet 3:
+-----+-----+-----+
| O | P | Q |
+-----+-----+-----+
| 3.0 | 2.0 | 1.0 |
+-----+-----+-----+
| 4.0 | 3.0 | 2.0 |
+-----+-----+-----+
How to log pyexcel¶
When developing source plugins, it becomes necessary to have log trace available. It helps find out what goes wrong quickly.
The basic step would be to set up logging before pyexcel import statement.
import logging
import logging.config
logging.basicConfig(format='%(asctime)s - %(name)s - %(levelname)s - %(message)s',
level=logging.DEBUG)
import pyexcel
And if you would use a complex configuration, you can use the following code.
import logging
import logging.config
logging.config.fileConfig('log.conf')
import pyexcel
And then save the following content as log.conf in your directory:
[loggers]
keys=root, sources, renderers
[handlers]
keys=consoleHandler
[formatters]
keys=custom
[logger_root]
level=INFO
handlers=consoleHandler
[logger_sources]
level=DEBUG
handlers=consoleHandler
qualname=pyexcel.sources.factory
propagate=0
[logger_renderers]
level=DEBUG
handlers=consoleHandler
qualname=pyexcel.renderers.factory
propagate=0
[handler_consoleHandler]
class=StreamHandler
level=DEBUG
formatter=custom
args=(sys.stdout,)
[formatter_custom]
format=%(asctime)s - %(name)s - %(levelname)s - %(message)s
datefmt=
Migrate away from 0.4.3¶
get_{{file_type}}_stream functions from pyexcel.Sheet and pyexel.Book were introduced since 0.4.3 but were removed since 0.4.4. Please be advised to use save_to_memory functions, Sheet.io.{{file_type}} or Book.io.{{file_type}}.
Migrate from 0.2.x to 0.3.0+¶
Filtering and formatting behavior of pyexcel.Sheet
are simplified. Soft
filter and soft formatter are removed. Extra classes such as iterator, formatter,
filter are removed.
Most of formatting tasks could be achieved using format()
and map()
. and Filtering with filter()
.
Formatting and filtering on row and/or column can be found with
row()
and column()
1. Updated filter function¶
There is no alternative to replace the folowing code:
sheet.filter(pe.OddRowFilter())
You will need to remove odd rows by yourself:
>>> import pyexcel as pe
>>> data = [
... ['1'],
... ['2'],
... ['3'],
... ]
>>> sheet = pe.Sheet(data)
>>> to_remove = []
>>> for index in sheet.row_range():
... if index % 2 == 0:
... to_remove.append(index)
>>> sheet.filter(row_indices=to_remove)
>>> sheet
pyexcel sheet:
+---+
| 2 |
+---+
Or, you could do this:
>>> data = [
... ['1'],
... ['2'],
... ['3'],
... ]
>>> sheet = pe.Sheet(data)
>>> def odd_filter(row_index, _):
... return row_index % 2 == 0
>>> del sheet.row[odd_filter]
>>> sheet
pyexcel sheet:
+---+
| 2 |
+---+
And the same applies to EvenRowFilter, OddColumnFilter, EvenColumnFilter.
2. Updated format function¶
2.1 Replacement of sheetformatter¶
The following formatting code:
sheet.apply_formatter(pe.sheets.formatters.SheetFormatter(int))
can be replaced by:
sheet.format(int)
2.2 Repalcement of row formatters¶
The following code:
row_formatter = pe.sheets.formatters.RowFormatter([1, 2], str)
sheet.add_formatter(row_formatter)
can be replaced by:
sheet.row.format([1, 2], str)
2.3 Replacement of column formatters¶
The following code:
f = NamedColumnFormatter(["Column 1", "Column 3"], str)
sheet.apply_formatter(f)
can be replaced by:
sheet.column.format(["Column 1", "Column 3"], str)
Migrate from 0.2.1 to 0.2.2+¶
1. Explicit imports, no longer needed¶
Please forget about these statements:
import pyexcel.ext.xls
import pyexcel.ext.ods
import pyexcel.ext.xlsx
They are no longer needed. As long as you have pip-installed them, they will be auto-loaded. However, if you do not want some of the plugins, please use pip to uninstall them.
What if you have your code as it is? No harm but a few warnings shown:
Deprecated usage since v0.2.2! Explicit import is no longer required. pyexcel.ext.ods is auto imported.
2. Invalid environment marker: platform_python_implementation==”PyPy”¶
Yes, it is a surprise. Please upgrade setuptools in your environment:
pip install --upgrade setuptools
At the time of writing, setuptools (18.0.1) or setuptools-21.0.0-py2.py3-none-any.whl is installed on author’s computer and worked.
3. How to keep both pyexcel-xls and pyexcel-xlsx¶
As in Issue 20, pyexcel-xls was used for xls and pyexcel-xlsx had to be used for xlsx. Both must co-exist due to requirements. The workaround would failed when auto-import are enabled in v0.2.2. Hence, user of pyexcel in this situation shall use ‘library’ parameter to all signature functions, to instruct pyexcel to use a named library for each function call.
4. pyexcel.get_io is no longer exposed¶
pyexcel.get_io was passed on from pyexcel-io. However, it is no longer exposed. Please use pyexcel_io.manager.RWManager.get_io if you have to.
You are likely to use pyexcel.get_io when you do pyexcel.Sheet.save_to_memory()
or pyexcel.Book.save_to_memory()
where you need to put in a io stream. But actually,
with latest code, you could put in a None.
Migrate from 0.1.x to 0.2.x¶
1. “Writer” is gone, Please use save_as.¶
Here is a piece of legacy code:
w = pyexcel.Writer("afile.csv")
data=[['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 1.1, 1]]
w.write_array(table)
w.close()
The new code is:
>>> data=[['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 1.1, 1]]
>>> pyexcel.save_as(array=data, dest_file_name="afile.csv")
Here is another piece of legacy code:
content = {
"X": [1,2,3,4,5],
"Y": [6,7,8,9,10],
"Z": [11,12,13,14,15],
}
w = pyexcel.Writer("afile.csv")
w.write_dict(self.content)
w.close()
The new code is:
>>> content = {
... "X": [1,2,3,4,5],
... "Y": [6,7,8,9,10],
... "Z": [11,12,13,14,15],
... }
>>> pyexcel.save_as(adict=content, dest_file_name="afile.csv")
Here is yet another piece of legacy code:
data = [
[1, 2, 3],
[4, 5, 6]
]
io = StringIO()
w = pyexcel.Writer(("csv",io))
w.write_rows(data)
w.close()
The new code is:
>>> data = [
... [1, 2, 3],
... [4, 5, 6]
... ]
>>> io = pyexcel.save_as(dest_file_type='csv', array=data)
>>> for line in io.readlines():
... print(line.rstrip())
1,2,3
4,5,6
2. “BookWriter” is gone. Please use save_book_as.¶
Here is a piece of legacy code:
import pyexcel
content = {
"Sheet1": [[1, 1, 1, 1], [2, 2, 2, 2], [3, 3, 3, 3]],
"Sheet2": [[4, 4, 4, 4], [5, 5, 5, 5], [6, 6, 6, 6]],
"Sheet3": [[u'X', u'Y', u'Z'], [1, 4, 7], [2, 5, 8], [3, 6, 9]]
}
w = pyexcel.BookWriter("afile.csv")
w.write_book_from_dict(content)
w.close()
The replacement code is:
>>> import pyexcel
>>> content = {
... "Sheet1": [[1, 1, 1, 1], [2, 2, 2, 2], [3, 3, 3, 3]],
... "Sheet2": [[4, 4, 4, 4], [5, 5, 5, 5], [6, 6, 6, 6]],
... "Sheet3": [[u'X', u'Y', u'Z'], [1, 4, 7], [2, 5, 8], [3, 6, 9]]
... }
>>> pyexcel.save_book_as(bookdict=content, dest_file_name="afile.csv")
Cook book¶
Recipes¶
Warning
The pyexcel DOES NOT consider Fonts, Styles and Charts at all. In the resulting excel files, fonts, styles and charts will not be transferred.
These recipes give a one-stop utility functions for known use cases. Similar functionality can be achieved using other application interfaces.
Update one column of a data file¶
Suppose you have one data file as the following:
example.xls
Column 1 | Column 2 | Column 3 |
---|---|---|
1 | 4 | 7 |
2 | 5 | 8 |
3 | 6 | 9 |
And you want to update Column 2
with these data: [11, 12, 13]
Here is the code:
>>> from pyexcel.cookbook import update_columns
>>> custom_column = {"Column 2":[11, 12, 13]}
>>> update_columns("example.xls", custom_column, "output.xls")
Your output.xls will have these data:
Column 1 | Column 2 | Column 3 |
---|---|---|
1 | 11 | 7 |
2 | 12 | 8 |
3 | 13 | 9 |
Update one row of a data file¶
Suppose you have the same data file:
example.xls
Row 1 | 1 | 2 | 3 |
Row 2 | 4 | 5 | 6 |
Row 3 | 7 | 8 | 9 |
And you want to update the second row with these data: [7, 4, 1]
Here is the code:
>>> from pyexcel.cookbook import update_rows
>>> custom_row = {"Row 1":[11, 12, 13]}
>>> update_rows("example.xls", custom_row, "output.xls")
Your output.xls will have these data:
Column 1 | Column 2 | Column 3 |
---|---|---|
7 | 4 | 1 |
2 | 5 | 8 |
3 | 6 | 9 |
Merge two files into one¶
Suppose you want to merge the following two data files:
example.csv
Column 1 | Column 2 | Column 3 |
---|---|---|
1 | 4 | 7 |
2 | 5 | 8 |
3 | 6 | 9 |
example.xls
Column 4 | Column 5 |
---|---|
10 | 12 |
11 | 13 |
The following code will merge the tow into one file, say “output.xls”:
>>> from pyexcel.cookbook import merge_two_files
>>> merge_two_files("example.csv", "example.xls", "output.xls")
The output.xls would have the following data:
Column 1 | Column 2 | Column 3 | Column 4 | Column 5 |
---|---|---|---|---|
1 | 4 | 7 | 10 | 12 |
2 | 5 | 8 | 11 | 13 |
3 | 6 | 9 |
Select candidate columns of two files and form a new one¶
Suppose you have these two files:
example.ods
Column 1 | Column 2 | Column 3 | Column 4 | Column 5 |
---|---|---|---|---|
1 | 4 | 7 | 10 | 13 |
2 | 5 | 8 | 11 | 14 |
3 | 6 | 9 | 12 | 15 |
example.xls
Column 6 | Column 7 | Column 8 | Column 9 | Column 10 |
---|---|---|---|---|
16 | 17 | 18 | 19 | 20 |
>>> data = [
... ["Column 1", "Column 2", "Column 3", "Column 4", "Column 5"],
... [1, 4, 7, 10, 13],
... [2, 5, 8, 11, 14],
... [3, 6, 9, 12, 15]
... ]
>>> s = pyexcel.Sheet(data)
>>> s.save_as("example.csv")
>>> data = [
... ["Column 6", "Column 7", "Column 8", "Column 9", "Column 10"],
... [16, 17, 18, 19, 20]
... ]
>>> s = pyexcel.Sheet(data)
>>> s.save_as("example.xls")
And you want to filter out column 2 and 4 from example.ods, filter out column 6 and 7 and merge them:
Column 1 | Column 3 | Column 5 | Column 8 | Column 9 | Column 10 |
---|---|---|---|---|---|
1 | 7 | 13 | 18 | 19 | 20 |
2 | 8 | 14 | |||
3 | 9 | 15 |
The following code will do the job:
>>> from pyexcel.cookbook import merge_two_readers
>>> sheet1 = pyexcel.get_sheet(file_name="example.csv", name_columns_by_row=0)
>>> sheet2 = pyexcel.get_sheet(file_name="example.xls", name_columns_by_row=0)
>>> del sheet1.column[1, 3, 5]
>>> del sheet2.column[0, 1]
>>> merge_two_readers(sheet1, sheet2, "output.xls")
Merge two files into a book where each file become a sheet¶
Suppose you want to merge the following two data files:
example.csv
Column 1 | Column 2 | Column 3 |
---|---|---|
1 | 4 | 7 |
2 | 5 | 8 |
3 | 6 | 9 |
example.xls
Column 4 | Column 5 |
---|---|
10 | 12 |
11 | 13 |
>>> data = [
... ["Column 1", "Column 2", "Column 3"],
... [1, 2, 3],
... [4, 5, 6],
... [7, 8, 9]
... ]
>>> s = pyexcel.Sheet(data)
>>> s.save_as("example.csv")
>>> data = [
... ["Column 4", "Column 5"],
... [10, 12],
... [11, 13]
... ]
>>> s = pyexcel.Sheet(data)
>>> s.save_as("example.xls")
The following code will merge the tow into one file, say “output.xls”:
>>> from pyexcel.cookbook import merge_all_to_a_book
>>> merge_all_to_a_book(["example.csv", "example.xls"], "output.xls")
The output.xls would have the following data:
example.csv as sheet name and inside the sheet, you have:
Column 1 | Column 2 | Column 3 |
---|---|---|
1 | 4 | 7 |
2 | 5 | 8 |
3 | 6 | 9 |
example.ods as sheet name and inside the sheet, you have:
Column 4 | Column 5 |
---|---|
10 | 12 |
11 | 13 |
Merge all excel files in directory into a book where each file become a sheet¶
The following code will merge every excel files into one file, say “output.xls”:
from pyexcel.cookbook import merge_all_to_a_book
import glob
merge_all_to_a_book(glob.glob("your_csv_directory\*.csv"), "output.xls")
You can mix and match with other excel formats: xls, xlsm and ods. For example, if you are sure you have only xls, xlsm, xlsx, ods and csv files in your_excel_file_directory, you can do the following:
from pyexcel.cookbook import merge_all_to_a_book
import glob
merge_all_to_a_book(glob.glob("your_excel_file_directory\*.*"), "output.xls")
Split a book into single sheet files¶
Suppose you have many sheets in a work book and you would like to separate each into a single sheet excel file. You can easily do this:
>>> from pyexcel.cookbook import split_a_book
>>> split_a_book("megabook.xls", "output.xls")
>>> import glob
>>> outputfiles = glob.glob("*_output.xls")
>>> for file in sorted(outputfiles):
... print(file)
...
Sheet 1_output.xls
Sheet 2_output.xls
Sheet 3_output.xls
for the output file, you can specify any of the supported formats
Extract just one sheet from a book¶
Suppose you just want to extract one sheet from many sheets that exists in a work book and you would like to separate it into a single sheet excel file. You can easily do this:
>>> from pyexcel.cookbook import extract_a_sheet_from_a_book
>>> extract_a_sheet_from_a_book("megabook.xls", "Sheet 1", "output.xls")
>>> if os.path.exists("Sheet 1_output.xls"):
... print("Sheet 1_output.xls exists")
...
Sheet 1_output.xls exists
for the output file, you can specify any of the supported formats
Real world cases¶
Questions and Answers¶
- Python flask writing to a csv file and reading it
- PyQt: Import .xls file and populate QTableWidget?
- How do I write data to csv file in columns and rows from a list in python?
- How to write dictionary values to a csv file using Python
- Python convert csv to xlsx
- How to read data from excel and set data type
- Remove or keep specific columns in csv file
- How can I put a CSV file in an array?
API documentation¶
API Reference¶
This is intended for users of pyexcel.
Signature functions¶
Obtaining data from excel file¶
get_array (**keywords) |
Obtain an array from an excel source |
get_dict ([name_columns_by_row]) |
Obtain a dictionary from an excel source |
get_records ([name_columns_by_row]) |
Obtain a list of records from an excel source |
get_book_dict (**keywords) |
Obtain a dictionary of two dimensional arrays |
get_book (**keywords) |
Get an instance of Book from an excel source |
get_sheet (**keywords) |
Get an instance of Sheet from an excel source |
iget_array (**keywords) |
Obtain a generator of an two dimensional array from an excel source |
iget_records (**keywords) |
Obtain a generator of a list of records from an excel source |
Saving data to excel file¶
save_as (**keywords) |
Save a sheet from a data source to another one |
isave_as (**keywords) |
Save a sheet from a data source to another one with less memory |
save_book_as (**keywords) |
Save a book from a data source to another one |
These flags can be passed on all signature functions:
auto_detect_int¶
Automatically convert float values to integers if the float number has no decimal values(e.g. 1.00). By default, it does the detection. Setting it to False will turn on this behavior
It has no effect on pyexcel-xlsx because it does that by default.
auto_detect_float¶
Automatically convert text to float values if possible. This applies only pyexcel-io where csv, tsv, csvz and tsvz formats are supported. By default, it does the detection. Setting it to False will turn on this behavior
auto_detect_datetime¶
Automatically convert text to python datetime if possible. This applies only pyexcel-io where csv, tsv, csvz and tsvz formats are supported. By default, it does the detection. Setting it to False will turn on this behavior
library¶
Name a pyexcel plugin to handle a file format. In the situation where multiple plugins were pip installed, it is confusing for pyexcel on which plugin to handle the file format. For example, both pyexcel-xlsx and pyexcel-xls reads xlsx format. Now since version 0.2.2, you can pass on library=”pyexcel-xls” to handle xlsx in a specific function call.
It is better to uninstall the unwanted pyexcel plugin using pip if two plugins for the same file type are not absolutely necessary.
Cookbook¶
merge_csv_to_a_book (filelist[, outfilename]) |
merge a list of csv files into a excel book |
merge_all_to_a_book (filelist[, outfilename]) |
merge a list of excel files into a excel book |
split_a_book (file_name[, outfilename]) |
Split a file into separate sheets |
extract_a_sheet_from_a_book (file_name, sheetname) |
Extract a sheet from a excel book |
Book¶
Here’s the entity relationship between Book, Sheet, Row and Column

Attribute¶
Book.number_of_sheets () |
Return the number of sheets |
Book.sheet_names () |
Return all sheet names |
Conversions¶
Book.bookdict |
Get/Set data in/from bookdict format |
Book.url |
Set data in url format |
Book.csv |
Get/Set data in/from csv format |
Book.tsv |
Get/Set data in/from tsv format |
Book.csvz |
Get/Set data in/from csvz format |
Book.tsvz |
Get/Set data in/from tsvz format |
Book.xls |
Get/Set data in/from xls format |
Book.xlsm |
Get/Set data in/from xlsm format |
Book.xlsx |
Get/Set data in/from xlsx format |
Book.ods |
Get/Set data in/from ods format |
Book.plain |
|
Book.simple |
|
Book.grid |
|
Book.pipe |
|
Book.orgtbl |
|
Book.rst |
|
Book.mediawiki |
|
Book.latex |
|
Book.latex_booktabs |
|
Book.json |
|
Book.html |
|
Book.stream |
Return a stream in which the book content is properly encoded |
Save changes¶
Book.save_as (filename) |
Save the content to a new file |
Book.save_to_memory (file_type[, stream]) |
Save the content to a memory stream |
Book.save_to_database (session, tables[, ...]) |
Save data in sheets to database tables |
Sheet¶
Constructor¶
Sheet ([sheet, name, name_columns_by_row, ...]) |
Two dimensional data container for filtering, formatting and iteration |
Attributes¶
Sheet.content |
Plain representation without headers |
Sheet.number_of_rows () |
The number of rows |
Sheet.number_of_columns () |
The number of columns |
Sheet.row_range () |
Utility function to get row range |
Sheet.column_range () |
Utility function to get column range |
Iteration¶
Sheet.rows () |
Returns a top to bottom row iterator |
Sheet.rrows () |
Returns a bottom to top row iterator |
Sheet.columns () |
Returns a left to right column iterator |
Sheet.rcolumns () |
Returns a right to left column iterator |
Sheet.enumerate () |
Iterate cell by cell from top to bottom and from left to right |
Sheet.reverse () |
Opposite to enumerate |
Sheet.vertical () |
Default iterator to go through each cell one by one from |
Sheet.rvertical () |
Default iterator to go through each cell one by one from rightmost |
Cell access¶
Sheet.cell_value (row, column[, new_value]) |
Random access to table cells |
Sheet.__getitem__ (aset) |
Row access¶
Sheet.row_at (index) |
Gets the data at the specified row |
Sheet.set_row_at (row_index, data_array) |
Update a row data range |
Sheet.delete_rows (row_indices) |
Delete one or more rows |
Sheet.extend_rows (rows) |
Take ordereddict to extend named rows |
Column access¶
Sheet.column_at (index) |
Gets the data at the specified column |
Sheet.set_column_at (column_index, data_array) |
Updates a column data range |
Sheet.delete_columns (column_indices) |
Delete one or more columns |
Sheet.extend_columns (columns) |
Take ordereddict to extend named columns |
Data series¶
Any column as row name¶
Sheet.name_columns_by_row (row_index) |
Use the elements of a specified row to represent individual columns |
Sheet.rownames |
Return row names if any |
Sheet.named_column_at (name) |
Get a column by its name |
Sheet.set_named_column_at (name, column_array) |
Take the first row as column names |
Sheet.delete_named_column_at (name) |
Works only after you named columns by a row |
Any row as column name¶
Sheet.name_rows_by_column (column_index) |
Use the elements of a specified column to represent individual rows |
Sheet.colnames |
Return column names if any |
Sheet.named_row_at (name) |
Get a row by its name |
Sheet.set_named_row_at (name, row_array) |
Take the first column as row names |
Sheet.delete_named_row_at (name) |
Take the first column as row names |
Conversion¶
Sheet.array |
Get/Set data in/from array format |
Sheet.records |
Get/Set data in/from records format |
Sheet.dict |
Get/Set data in/from dict format |
Sheet.url |
Set data in url format |
Sheet.csv |
Get/Set data in/from csv format |
Sheet.tsv |
Get/Set data in/from tsv format |
Sheet.csvz |
Get/Set data in/from csvz format |
Sheet.tsvz |
Get/Set data in/from tsvz format |
Sheet.xls |
Get/Set data in/from xls format |
Sheet.xlsm |
Get/Set data in/from xlsm format |
Sheet.xlsx |
Get/Set data in/from xlsx format |
Sheet.ods |
Get/Set data in/from ods format |
Sheet.plain |
|
Sheet.simple |
|
Sheet.grid |
|
Sheet.pipe |
|
Sheet.orgtbl |
|
Sheet.rst |
|
Sheet.mediawiki |
|
Sheet.latex |
|
Sheet.latex_booktabs |
|
Sheet.json |
|
Sheet.html |
|
Sheet.stream |
Return a stream in which the sheet content is properly encoded |
Formatting¶
Sheet.format (formatter) |
Apply a formatting action for the whole sheet |
Sheet.apply_formatter (aformatter) |
Apply the formatter immediately |
Filtering¶
Sheet.filter ([column_indices, row_indices]) |
Apply the filter with immediate effect |
Transformation¶
Sheet.transpose () |
Rotate the data table by 90 degrees |
Sheet.map (custom_function) |
Execute a function across all cells of the sheet |
Sheet.region (topleft_corner, bottomright_corner) |
Get a rectangle shaped data out |
Sheet.cut (topleft_corner, bottomright_corner) |
Get a rectangle shaped data out and clear them in position |
Sheet.paste (topleft_corner[, rows, columns]) |
Paste a rectangle shaped data after a position |
Save changes¶
Sheet.save_as (filename, **keywords) |
Save the content to a named file |
Sheet.save_to_memory (file_type[, stream]) |
Save the content to memory |
Sheet.save_to_database (session, table[, ...]) |
Save data in sheet to database table |
Internal API reference¶
This is intended for developers and hackers of pyexcel.
Data sheet representation¶
In inheritance order from parent to child
Matrix (array) |
The internal representation of a sheet data. |
SheetStream (name, payload) |
Memory efficient sheet representation |
BookStream ([sheets, filename, path]) |
Memory efficient book representation |
Developer’s guide¶
Developer’s guide¶
Here’s the architecture of pyexcel

Pull requests are welcome.
Development steps for code changes
- git clone https://github.com/pyexcel/pyexcel.git
- cd pyexcel
Upgrade your setup tools and pip. They are needed for development and testing only:
- pip install –upgrade setuptools “pip==7.1”
Then install relevant development requirements:
- pip install -r rnd_requirements.txt # if such a file exists
- pip install -r requirements.txt
- pip install -r tests/requirements.txt
In order to update test environment, and documentation, additional setps are required:
- pip install moban
- git clone https://github.com/pyexcel/pyexcel-commons.git commons
- make your changes in .moban.d directory, then issue command moban
What is rnd_requirements.txt¶
Usually, it is created when a dependent library is not released. Once the dependecy is installed(will be released), the future version of the dependency in the requirements.txt will be valid.
What is pyexcel-commons¶
Many information that are shared across pyexcel projects, such as: this developer guide, license info, etc. are stored in pyexcel-commons project.
What is .moban.d¶
.moban.d stores the specific meta data for the library.
How to test your contribution¶
Although nose and doctest are both used in code testing, it is adviable that unit tests are put in tests. doctest is incorporated only to make sure the code examples in documentation remain valid across different development releases.
On Linux/Unix systems, please launch your tests like this:
$ make test
On Windows systems, please issue this command:
> test.bat
Acceptance criteria¶
- Has fair amount of documentation
- Has Test cases written
- Has all code lines tested
- Passes all Travis CI builds
- Pythonic code please
- Agree on NEW BSD License for your contribution
Change log¶
Change log¶
0.4.4 - 06.02.2016¶
Updated¶
Removed¶
- remove get_{{file_type}}_stream functions from pyexcel.Sheet and pyexel.Book introduced since 0.4.3.
0.4.3 - 26.01.2017¶
Added¶
- ‘.stream’ attribte are attached to ~pyexcel.Sheet and ~pyexcel.Book to get direct access the underneath stream in responding to file type attributes, such as sheet.xls. it helps provide a custom stream to external world, for example, Sheet.stream.csv gives a text stream that contains csv formatted data. Book.stream.xls returns a xls format data in a byte stream.
Updated¶
- Better error reporting when an unknown parameters or unsupported file types were given to the signature functions.
0.4.2 - 17.01.2017¶
Updated¶
- Raise exception if the incoming sheet does not have column names. In other words, only sheet with column names could be saved to database. sheet with row names cannot be saved. The alternative is to tranpose the sheet, then name_columns_by_row and then save.
- fix iget_records where a non-uniform content should be given, e.g. [[“x”, “y”], [1, 2], [3]], some record would become non-uniform, e.g. key ‘y’ would be missing from the second record.
- skip_empty_rows is applicable when saving a python data structure to another data source. For example, if your array contains a row which is consisted of empty string, such as [‘’, ‘’, ‘’ ... ‘’], please specify skip_empty_rows=False in order to preserve it. This becomes subtle when you try save a python dictionary where empty rows is not easy to be spotted.
- #69: better documentation for save_book_as.
0.4.1 - 23.12.2016¶
0.4.0 - 22.12.2016¶
Added¶
- Flask-Excel issue 19 allow sheet_name parameter
- pyexcel-xls issue 11 case-incenstive for file_type. xls and XLS are treated in the same way
0.3.3 - 07.11.2016¶
0.3.0 - 28.10.2016¶
Added:¶
- file type setters for Sheet and Book, and its documentation
- iget_records returns a generator for a list of records and should have better memory performance, especially dealing with large csv files.
- iget_array returns a generator for a list of two dimensional array and should have better memory performance, especially dealing with large csv files.
- Enable pagination support, and custom row renderer via pyexcel-io v0.2.3
Updated¶
- Take isave_as out from save_as. Hence two functions are there for save a sheet as
- # 60: encode ‘utf-8’ if the console is of ascii encoding.
- # 59: custom row renderer
- # 56: set cell value does not work
- pyexcel.transpose becomes pyexcel.sheets.transpose
- iterator functions of pyexcel.Sheet were converted to generator
functions
- pyexcel.Sheet.enumerate()
- pyexcel.Sheet.reverse()
- pyexcel.Sheet.vertical()
- pyexcel.Sheet.rvertical()
- pyexcel.Sheet.rows()
- pyexcel.Sheet.rrows()
- pyexcel.Sheet.columns()
- pyexcel.Sheet.rcolumns()
- pyexcel.Sheet.named_rows()
- pyexcel.Sheet.named_columns()
- ~pyexcel.Sheet.save_to_memory and ~pyexcel.Book.save_to_memory return the actual content. No longer they will return a io object hence you cannot call getvalue() on them.
Removed:¶
- content and out_file as function parameters to the signature functions are no longer supported.
- SourceFactory and RendererFactory are removed
- The following methods are removed
- pyexcel.to_array
- pyexcel.to_dict
- pyexcel.utils.to_one_dimensional_array
- pyexcel.dict_to_array
- pyexcel.from_records
- pyexcel.to_records
- pyexcel.Sheet.filter has been re-implemented and all filters were
removed:
- pyexcel.filters.ColumnIndexFilter
- pyexcel.filters.ColumnFilter
- pyexcel.filters.RowFilter
- pyexcel.filters.EvenColumnFilter
- pyexcel.filters.OddColumnFilter
- pyexcel.filters.EvenRowFilter
- pyexcel.filters.OddRowFilter
- pyexcel.filters.RowIndexFilter
- pyexcel.filters.SingleColumnFilter
- pyexcel.filters.RowValueFilter
- pyexcel.filters.NamedRowValueFilter
- pyexcel.filters.ColumnValueFilter
- pyexcel.filters.NamedColumnValueFilter
- pyexcel.filters.SingleRowFilter
- the following functions have been removed
- add_formatter
- remove_formatter
- clear_formatters
- freeze_formatters
- add_filter
- remove_filter
- clear_filters
- freeze_formatters
- pyexcel.Sheet.filter has been re-implemented and all filters were
removed:
- pyexcel.formatters.SheetFormatter
0.2.3 - 11.07.2016¶
Updated:¶
- For python 3, do not seek 0 when saving to memory if sys.stdout is passed on. Hence, adding support for sys.stdin and sys.stdout.
0.2.2 - 01.06.2016¶
Updated:¶
- Explicit imports, no longer needed
- Depends on latest setuptools 18.0.1
- NotImplementedError will be raised if parameters to core functions are not supported, e.g. get_sheet(cannot_find_me_option=”will be thrown out as NotImplementedError”)
0.2.1 - 23.04.2016¶
Added:¶
- add pyexcel-text file types as attributes of pyexcel.Sheet and pyexcel.Book, related to issue 31
- auto import pyexcel-text if it is pip installed
Updated:¶
- code refactoring done for easy addition of sources.
- bug fix issue 29, Even if the format is a string it is displayed as a float
- pyexcel-text is no longer a plugin to pyexcel-io but to pyexcel.sources, see pyexcel-text issue #22