Stream APIs for big file : A set of two liners ================================================================================ When you are dealing with **BIG** excel files, you will want **pyexcel** to use constant memory. This section shows you how to get data from your **BIG** excel files and how to export data to excel files in **two lines** at most, without eating all your computer memory. Two liners for get data from big excel files -------------------------------------------------------------------------------- Get a list of dictionaries ******************************************************************************** .. testcode:: :hide: >>> import os >>> import pyexcel as p >>> content=""" ... Coffees,Serving Size,Caffeine (mg) ... Starbucks Coffee Blonde Roast,venti(20 oz),475 ... Dunkin' Donuts Coffee with Turbo Shot,large(20 oz.),398 ... Starbucks Coffee Pike Place Roast,grande(16 oz.),310 ... Panera Coffee Light Roast,regular(16 oz.),300 ... """.strip() >>> sheet = p.get_sheet(file_content=content, file_type='csv') >>> sheet.save_as("your_file.xls") Suppose you want to process the following coffee data: .. pyexcel-table:: ---pyexcel:Huge list of coffeine drinks--- Coffees,Serving Size,Caffeine (mg) Starbucks Coffee Blonde Roast,venti(20 oz),475 Dunkin' Donuts Coffee with Turbo Shot,large(20 oz.),398 Starbucks Coffee Pike Place Roast,grande(16 oz.),310 Panera Coffee Light Roast,regular(16 oz.),300 Let's get a list of dictionary out from the xls file: .. code-block:: python >>> records = p.iget_records(file_name="your_file.xls") And let's check what do we have: .. code-block:: python >>> for r in records: ... print(f"{r['Serving Size']} of {r['Coffees']} has {r['Caffeine (mg)']} mg") venti(20 oz) of Starbucks Coffee Blonde Roast has 475 mg large(20 oz.) of Dunkin' Donuts Coffee with Turbo Shot has 398 mg grande(16 oz.) of Starbucks Coffee Pike Place Roast has 310 mg regular(16 oz.) of Panera Coffee Light Roast has 300 mg Please do not forgot the second line to close the opened file handle: .. code-block:: python >>> p.free_resources() Get two dimensional array ******************************************************************************** Instead, what if you have to use `pyexcel.get_array` to do the same: .. code-block:: python >>> for row in p.iget_array(file_name="your_file.xls", start_row=1): ... print(f"{row[1]} of {row[0]} has {row[2]} mg") venti(20 oz) of Starbucks Coffee Blonde Roast has 475 mg large(20 oz.) of Dunkin' Donuts Coffee with Turbo Shot has 398 mg grande(16 oz.) of Starbucks Coffee Pike Place Roast has 310 mg regular(16 oz.) of Panera Coffee Light Roast has 300 mg Again, do not forgot the second line: .. code-block:: python >>> p.free_resources() where `start_row` skips the header row. Data export in one liners --------------------------------------------- Export an array ********************** Suppose you have the following array: .. code-block:: python >>> data = [[1, 2, 3], [4, 5, 6], [7, 8, 9]] And here is the code to save it as an excel file : .. code-block:: python >>> p.isave_as(array=data, dest_file_name="example.xls") But the following line is not required because the data source are not file sources: .. code-block:: python >>> # p.free_resources() Let's verify it: .. code-block:: python >>> p.get_sheet(file_name="example.xls") pyexcel_sheet1: +---+---+---+ | 1 | 2 | 3 | +---+---+---+ | 4 | 5 | 6 | +---+---+---+ | 7 | 8 | 9 | +---+---+---+ .. testcode:: :hide: >>> import os >>> os.unlink("example.xls") And here is the code to save it as a csv file : .. code-block:: python >>> p.isave_as(array=data, ... dest_file_name="example.csv", ... dest_delimiter=':') Let's verify it: .. code-block:: python >>> with open("example.csv") as f: ... for line in f.readlines(): ... print(line.rstrip()) ... 1:2:3 4:5:6 7:8:9 Export a list of dictionaries ********************************** .. code-block:: python >>> records = [ ... {"year": 1903, "country": "Germany", "speed": "206.7km/h"}, ... {"year": 1964, "country": "Japan", "speed": "210km/h"}, ... {"year": 2008, "country": "China", "speed": "350km/h"} ... ] >>> p.isave_as(records=records, dest_file_name='high_speed_rail.xls') Export a dictionary of single key value pair ******************************************************************************** .. code-block:: python >>> henley_on_thames_facts = { ... "area": "5.58 square meters", ... "population": "11,619", ... "civial parish": "Henley-on-Thames", ... "latitude": "51.536", ... "longitude": "-0.898" ... } >>> p.isave_as(adict=henley_on_thames_facts, dest_file_name='henley.xlsx') Export a dictionary of single dimensonal array ******************************************************************************** .. code-block:: python >>> ccs_insights = { ... "year": ["2017", "2018", "2019", "2020", "2021"], ... "smart phones": [1.53, 1.64, 1.74, 1.82, 1.90], ... "feature phones": [0.46, 0.38, 0.30, 0.23, 0.17] ... } >>> p.isave_as(adict=ccs_insights, dest_file_name='ccs.csv') >>> p.free_resources() Export a dictionary of two dimensional array as a book ******************************************************************************** Suppose you want to save the below dictionary to an excel file : .. code-block:: python >>> a_dictionary_of_two_dimensional_arrays = { ... 'Sheet 1': ... [ ... [1.0, 2.0, 3.0], ... [4.0, 5.0, 6.0], ... [7.0, 8.0, 9.0] ... ], ... 'Sheet 2': ... [ ... ['X', 'Y', 'Z'], ... [1.0, 2.0, 3.0], ... [4.0, 5.0, 6.0] ... ], ... 'Sheet 3': ... [ ... ['O', 'P', 'Q'], ... [3.0, 2.0, 1.0], ... [4.0, 3.0, 2.0] ... ] ... } Here is the code: .. code-block:: python >>> p.isave_book_as( ... bookdict=a_dictionary_of_two_dimensional_arrays, ... dest_file_name="book.xls" ... ) If you want to preserve the order of sheets in your dictionary, you have to pass on an ordered dictionary to the function itself. For example: .. code-block:: python >>> from pyexcel._compact import OrderedDict >>> data = OrderedDict() >>> data.update({"Sheet 2": a_dictionary_of_two_dimensional_arrays['Sheet 2']}) >>> data.update({"Sheet 1": a_dictionary_of_two_dimensional_arrays['Sheet 1']}) >>> data.update({"Sheet 3": a_dictionary_of_two_dimensional_arrays['Sheet 3']}) >>> p.isave_book_as(bookdict=data, dest_file_name="book.xls") >>> p.free_resources() Let's verify its order: .. code-block:: python >>> import json >>> book_dict = p.get_book_dict(file_name="book.xls") >>> for key, item in book_dict.items(): ... print(json.dumps({key: item})) {"Sheet 2": [["X", "Y", "Z"], [1, 2, 3], [4, 5, 6]]} {"Sheet 1": [[1, 2, 3], [4, 5, 6], [7, 8, 9]]} {"Sheet 3": [["O", "P", "Q"], [3, 2, 1], [4, 3, 2]]} Please notice that "Sheet 2" is the first item in the *book_dict*, meaning the order of sheets are preserved. File format transcoding on one line ------------------------------------------- .. note:: Please note that the following file transcoding could be with zero line. Please install pyexcel-cli and you will do the transcode in one command. No need to open your editor, save the problem, then python run. .. testcode:: :hide: >>> import datetime >>> data = [ ... ["name", "weight", "birth"], ... ["Adam", 3.4, datetime.date(2015, 2, 3)], ... ["Smith", 4.2, datetime.date(2014, 11, 12)] ... ] >>> p.isave_as(array=data, dest_file_name="birth.xls") The following code does a simple file format transcoding from xls to csv: .. code-block:: python >>> import pyexcel >>> p.save_as(file_name="birth.xls", dest_file_name="birth.csv") Again it is really simple. Let's verify what we have gotten: .. code-block:: python >>> sheet = p.get_sheet(file_name="birth.csv") >>> sheet birth.csv: +-------+--------+----------+ | name | weight | birth | +-------+--------+----------+ | Adam | 3.4 | 03/02/15 | +-------+--------+----------+ | Smith | 4.2 | 12/11/14 | +-------+--------+----------+ .. note:: Please note that csv(comma separate value) file is pure text file. Formula, charts, images and formatting in xls file will disappear no matter which transcoding tool you use. Hence, pyexcel is a quick alternative for this transcoding job. Let use previous example and save it as xlsx instead .. code-block:: python >>> import pyexcel >>> p.isave_as(file_name="birth.xls", ... dest_file_name="birth.xlsx") # change the file extension Again let's verify what we have gotten: .. code-block:: python >>> sheet = p.get_sheet(file_name="birth.xlsx") >>> sheet pyexcel_sheet1: +-------+--------+----------+ | name | weight | birth | +-------+--------+----------+ | Adam | 3.4 | 03/02/15 | +-------+--------+----------+ | Smith | 4.2 | 12/11/14 | +-------+--------+----------+ .. testcode:: :hide: >>> import os >>> os.unlink('ccs.csv') >>> os.unlink('book.xls')