Stream APIs for big file : A set of two liners

When you are dealing with BIG excel files, you will want pyexcel to use constant memory.

This section shows you how to get data from your BIG excel files and how to export data to excel files in two lines at most, without eating all your computer memory.

Two liners for get data from big excel files

Get a list of dictionaries

Suppose you want to process the following coffee data:

Let’s get a list of dictionary out from the xls file:

>>> records = p.iget_records(file_name="your_file.xls")

And let’s check what do we have:

>>> for r in records:
...     print(f"{r['Serving Size']} of {r['Coffees']} has {r['Caffeine (mg)']} mg")
venti(20 oz) of Starbucks Coffee Blonde Roast has 475 mg
large(20 oz.) of Dunkin' Donuts Coffee with Turbo Shot has 398 mg
grande(16 oz.) of Starbucks Coffee Pike Place Roast has 310 mg
regular(16 oz.) of Panera Coffee Light Roast has 300 mg

Please do not forgot the second line to close the opened file handle:

>>> p.free_resources()

Get two dimensional array

Instead, what if you have to use pyexcel.get_array to do the same:

>>> for row in p.iget_array(file_name="your_file.xls", start_row=1):
...     print(f"{row[1]} of {row[0]} has {row[2]} mg")
venti(20 oz) of Starbucks Coffee Blonde Roast has 475 mg
large(20 oz.) of Dunkin' Donuts Coffee with Turbo Shot has 398 mg
grande(16 oz.) of Starbucks Coffee Pike Place Roast has 310 mg
regular(16 oz.) of Panera Coffee Light Roast has 300 mg

Again, do not forgot the second line:

>>> p.free_resources()

where start_row skips the header row.

Data export in one liners

Export an array

Suppose you have the following array:

>>> data = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]

And here is the code to save it as an excel file :

>>> p.isave_as(array=data, dest_file_name="example.xls")

But the following line is not required because the data source are not file sources:

>>> # p.free_resources()

Let’s verify it:

>>> p.get_sheet(file_name="example.xls")
pyexcel_sheet1:
+---+---+---+
| 1 | 2 | 3 |
+---+---+---+
| 4 | 5 | 6 |
+---+---+---+
| 7 | 8 | 9 |
+---+---+---+

And here is the code to save it as a csv file :

>>> p.isave_as(array=data,
...            dest_file_name="example.csv",
...            dest_delimiter=':')

Let’s verify it:

>>> with open("example.csv") as f:
...     for line in f.readlines():
...         print(line.rstrip())
...
1:2:3
4:5:6
7:8:9

Export a list of dictionaries

>>> records = [
...     {"year": 1903, "country": "Germany", "speed": "206.7km/h"},
...     {"year": 1964, "country": "Japan", "speed": "210km/h"},
...     {"year": 2008, "country": "China", "speed": "350km/h"}
... ]
>>> p.isave_as(records=records, dest_file_name='high_speed_rail.xls')

Export a dictionary of single key value pair

>>> henley_on_thames_facts = {
...     "area": "5.58 square meters",
...     "population": "11,619",
...     "civial parish": "Henley-on-Thames",
...     "latitude": "51.536",
...     "longitude": "-0.898"
... }
>>> p.isave_as(adict=henley_on_thames_facts, dest_file_name='henley.xlsx')

Export a dictionary of single dimensonal array

>>> ccs_insights = {
...     "year": ["2017", "2018", "2019", "2020", "2021"],
...     "smart phones": [1.53, 1.64, 1.74, 1.82, 1.90],
...     "feature phones": [0.46, 0.38, 0.30, 0.23, 0.17]
... }
>>> p.isave_as(adict=ccs_insights, dest_file_name='ccs.csv')
>>> p.free_resources()

Export a dictionary of two dimensional array as a book

Suppose you want to save the below dictionary to an excel file :

>>> a_dictionary_of_two_dimensional_arrays = {
...      'Sheet 1':
...          [
...              [1.0, 2.0, 3.0],
...              [4.0, 5.0, 6.0],
...              [7.0, 8.0, 9.0]
...          ],
...      'Sheet 2':
...          [
...              ['X', 'Y', 'Z'],
...              [1.0, 2.0, 3.0],
...              [4.0, 5.0, 6.0]
...          ],
...      'Sheet 3':
...          [
...              ['O', 'P', 'Q'],
...              [3.0, 2.0, 1.0],
...              [4.0, 3.0, 2.0]
...          ]
...  }

Here is the code:

>>> p.isave_book_as(
...    bookdict=a_dictionary_of_two_dimensional_arrays,
...    dest_file_name="book.xls"
... )

If you want to preserve the order of sheets in your dictionary, you have to pass on an ordered dictionary to the function itself. For example:

>>> from pyexcel._compact import OrderedDict
>>> data = OrderedDict()
>>> data.update({"Sheet 2": a_dictionary_of_two_dimensional_arrays['Sheet 2']})
>>> data.update({"Sheet 1": a_dictionary_of_two_dimensional_arrays['Sheet 1']})
>>> data.update({"Sheet 3": a_dictionary_of_two_dimensional_arrays['Sheet 3']})
>>> p.isave_book_as(bookdict=data, dest_file_name="book.xls")
>>> p.free_resources()

Let’s verify its order:

>>> import json
>>> book_dict = p.get_book_dict(file_name="book.xls")
>>> for key, item in book_dict.items():
...     print(json.dumps({key: item}))
{"Sheet 2": [["X", "Y", "Z"], [1, 2, 3], [4, 5, 6]]}
{"Sheet 1": [[1, 2, 3], [4, 5, 6], [7, 8, 9]]}
{"Sheet 3": [["O", "P", "Q"], [3, 2, 1], [4, 3, 2]]}

Please notice that “Sheet 2” is the first item in the book_dict, meaning the order of sheets are preserved.

File format transcoding on one line

Note

Please note that the following file transcoding could be with zero line. Please install pyexcel-cli and you will do the transcode in one command. No need to open your editor, save the problem, then python run.

The following code does a simple file format transcoding from xls to csv:

>>> import pyexcel
>>> p.save_as(file_name="birth.xls", dest_file_name="birth.csv")

Again it is really simple. Let’s verify what we have gotten:

>>> sheet = p.get_sheet(file_name="birth.csv")
>>> sheet
birth.csv:
+-------+--------+----------+
| name  | weight | birth    |
+-------+--------+----------+
| Adam  | 3.4    | 03/02/15 |
+-------+--------+----------+
| Smith | 4.2    | 12/11/14 |
+-------+--------+----------+

Note

Please note that csv(comma separate value) file is pure text file. Formula, charts, images and formatting in xls file will disappear no matter which transcoding tool you use. Hence, pyexcel is a quick alternative for this transcoding job.

Let use previous example and save it as xlsx instead

>>> import pyexcel
>>> p.isave_as(file_name="birth.xls",
...            dest_file_name="birth.xlsx") # change the file extension

Again let’s verify what we have gotten:

>>> sheet = p.get_sheet(file_name="birth.xlsx")
>>> sheet
pyexcel_sheet1:
+-------+--------+----------+
| name  | weight | birth    |
+-------+--------+----------+
| Adam  | 3.4    | 03/02/15 |
+-------+--------+----------+
| Smith | 4.2    | 12/11/14 |
+-------+--------+----------+