Openpyxl: Master Excel with Python

Openpyxl practical guide for Python: read, write, and format Excel workbooks with code samples. Covers installation, core APIs, and real-world automation workflows.

XLS Library
XLS Library Team
·5 min read
Openpyxl Essentials - XLS Library
Photo by Pexelsvia Pixabay
Quick AnswerDefinition

Openpyxl is a Python library that enables reading, writing, and formatting Excel workbooks (.xlsx and .xlsm) without Excel installed. It’s ideal for data pipelines, automation tasks, and reproducible analyses. This guide from XLS Library shows installation, core concepts, and practical workflows to automate Excel tasks with Python using openpyxl.

Why openpyxl fits Python data pipelines

According to XLS Library, openpyxl is a mature, well-documented library for manipulating Excel workbooks from Python. It provides a clean, object-oriented API to create sheets, populate cells, apply styles, and export finished files. This fits data pipelines where Excel is the input or output format, or where you need reproducible reports generated by code. While it is not a full Excel replacement, openpyxl streamlines repetitive tasks, reduces manual errors, and enables automation from data extraction to presentation-ready workbooks.

Python
from openpyxl import Workbook wb = Workbook() ws = wb.active ws.title = "Data" ws['A1'] = "Sample" ws['B1'] = 42 wb.save("sample.xlsx")

What this code does: creates a workbook, sets a sheet title, writes simple values, and saves the file. This is the entry point for more advanced data workflows using openpyxl.

Getting started: install and basic workbook creation

To begin, install openpyxl via pip and verify the installation. The following commands work across platforms (Windows, macOS, Linux).

Bash
pip install openpyxl
Python
# Quick check to confirm installation and version import openpyxl print(openpyxl.__version__)

Why this matters: a successful installation ensures you can access the API without relying on Excel being installed. This paves the way for scripting, testing, and automating Excel-related tasks in Python projects.

When you’re ready to create a workbook from scratch, you’ll typically start with Workbook() and active sheet access, as shown above.

Creating a workbook and writing data

Openpyxl makes it straightforward to populate rows and columns. The following example writes a small table and saves it as data.xlsx.

Python
from openpyxl import Workbook wb = Workbook() ws = wb.active ws.title = "Sales" headers = ["Month", "Revenue"] ws.append(headers) for i, m in enumerate(["Jan","Feb","Mar"], start=1): ws.append([m, i * 1000]) wb.save("data.xlsx")

What happens here: we create a sheet named Sales, append a header row, then loop to fill data. Using append is convenient for building rows from lists. If you prefer cell-by-cell control, you can assign values with ws.cell(row=r, column=c, value=...).

One common pattern is building data structures (lists of dictionaries) and writing them with a small helper function to map keys to columns.

Reading data from Excel using openpyxl

Reading data is often as important as writing it. The following snippet loads an existing workbook in read mode and prints the first column. It uses a safe approach to iterate rows and extract values.

Python
from openpyxl import load_workbook wb = load_workbook("data.xlsx", read_only=True, data_only=True) ws = wb.active for row in ws.iter_rows(min_row=2, max_col=2, values_only=True): month, revenue = row print(month, revenue)

Notes:

  • read_only=True reduces memory usage for large files.
  • data_only=True returns calculated values when formulas exist.
  • iter_rows with values_only=True yields tuples of cell values for easy processing.

If you need to preserve formulas, omit data_only or access the formula through cell.value on a non-read-only workbook.

Iterating rows and columns efficiently

For larger datasets, explicit cell access can be slower. A recommended approach is to use iter_rows with values_only=True to stream data, or use ws.values to iterate across all rows. This snippet demonstrates streaming a simple transformation:

Python
from openpyxl import load_workbook wb = load_workbook("data.xlsx", read_only=True, data_only=True) ws = wb.active # Transform: create a new list of dicts with column headers as keys headers = [cell.value for cell in next(ws.iter_rows(min_row=1, max_row=1, values_only=True))] data = [] for row in ws.iter_rows(min_row=2, values_only=True): data.append(dict(zip(headers, row))) print(data[:3])

This pattern helps when converting Excel data into Python-native structures for analysis or further processing. If memory permits, you can also write transformed data directly to a new workbook or a CSV file for downstream pipelines.

Formatting cells and applying styles

Openpyxl supports styling to improve readability of generated reports. You can set fonts, colors, alignment, and fills. The following example demonstrates a bold header row with a blue fill:

Python
from openpyxl import Workbook from openpyxl.styles import Font, PatternFill wb = Workbook() ws = wb.active ws.append(["Month", "Revenue"]) ws.append(["Jan", 12000]) # Style header row (first row) header = ws[1] for cell in header: cell.font = Font(bold=True, color="FFFFFF") cell.fill = PatternFill(start_color="4F81BD", end_color="4F81BD", fill_type="solid") wb.save("styled.xlsx")

Why styling matters: styled workbooks are easier to read and share with non-technical stakeholders. A common approach is to separate data, headers, and totals using color, borders, and bold fonts. You can also apply number formatting (e.g., decimals, currency) via openpyxl.styles.numbers or NamedStyle for consistency across sheets.

Formulas, charts, and images: extending Excel with openpyxl

Openpyxl can populate formulas and, to some extent, embed charts into workbooks. The following example writes a simple sum formula and adds a small bar chart to visualize values:

Python
from openpyxl import Workbook from openpyxl.chart import BarChart, Reference wb = Workbook() ws = wb.active ws.append(["Month", "Sales"]) ws.append(["Jan", 1000]) ws.append(["Feb", 1500]) ws.append(["Mar", 1200]) # Write a formula in C2 that sums B2:B4 ws['C2'] = "=SUM(B2:B4)" # Create and attach a simple bar chart chart = BarChart() data = Reference(ws, min_col=2, min_row=1, max_row=4) chart.add_data(data, titles_from_data=True) ws.add_chart(chart, "E2") wb.save("charts.xlsx")

Caveats: openpyxl can generate charts but rendering options are limited compared to Excel’s UI. Formulas are stored as strings and evaluated by Excel when the file is opened. For complex charts, consider exporting data and using Excel’s native chart tools or BI tools for visualization.

Working with large files and memory considerations

When dealing with big datasets, avoid loading entire files into memory. Use write-only mode to append data efficiently, or process data in chunks. The example below demonstrates a streaming write pattern that scales well for millions of rows:

Python
from openpyxl import Workbook wb = Workbook(write_only=True) ws = wb.create_sheet() # Simulate streaming data for i in range(1_000_0): ws.append([i, i * 2 if i % 2 == 0 else i / 2]) wb.save("large.xlsx")

If you must read very large files, prefer read_only=True and iterate with iter_rows as shown earlier to keep memory usage in check. For stable automation, consider batching writes to separate output files and validating integrity after each batch.

Debugging and common pitfalls

Common issues with openpyxl involve mixing read/write modes, misaligned indices, and mismatched data types. A cautious approach is to validate input data before writing and to handle exceptions gracefully. The snippet below demonstrates basic error handling when loading a workbook:

Python
from openpyxl import load_workbook try: wb = load_workbook("data.xlsx") except FileNotFoundError: print("The file was not found. Create it first or provide a valid path.") except Exception as e: print("An unexpected error occurred:", e)

Best practice: keep a small tests directory with sample workbooks to validate scripts during development. This reduces surprises when run in CI or automated pipelines.

Steps

Estimated time: 30-60 minutes

  1. 1

    Install and verify environment

    Install the openpyxl package with pip and confirm the installation by printing the version. This ensures the Python environment is prepared for workbook operations.

    Tip: Use a virtual environment to isolate dependencies.
  2. 2

    Create a workbook and write data

    Initialize a workbook, create a data sheet, and populate a few rows. Saving the file produces a tangible Excel workbook to inspect.

    Tip: Prefer `append` for simple row-wise additions.
  3. 3

    Read data back for validation

    Load the workbook in read-only mode when possible and iterate rows to verify content matches expectations.

    Tip: Use `values_only=True` to simplify value access.
  4. 4

    Apply formatting and formulas

    Add a header style and include simple formulas. This demonstrates how to enrich data in code.

    Tip: Formulas are stored as strings and evaluated by Excel.
  5. 5

    Persist and test

    Save the final workbook and re-open to verify styles, data types, and formulas render as intended.

    Tip: Automate a quick re-check to catch regressions.
Warning: Avoid overwriting important files; write to a new filename or backup originals.
Pro Tip: Use write_only for large datasets to minimize memory usage.
Note: When sharing generated workbooks, ensure formulas reference correct cell ranges.

Prerequisites

Required

Optional

  • Code editor or IDE
    Optional

Commands

ActionCommand
Install openpyxlUpgrade with pip install --upgrade openpyxlpip install openpyxl
Run a Python scriptEnsure you're in the script's directorypython script.py
Check installed versionpython -c 'import openpyxl; print(openpyxl.__version__)'
Create a quick workbook inlineInline here-doc for quick experimentspython - << 'PY' from openpyxl import Workbook wb = Workbook() wb.save('sample.xlsx') PY
Upgrade openpyxlpip install --upgrade openpyxl

People Also Ask

What is openpyxl?

openpyxl is a Python library for reading, writing, and manipulating Excel files in the .xlsx and .xlsm formats. It allows automation of common Excel tasks from Python without needing Excel installed.

openpyxl lets you read and write Excel files directly from Python, automating many common tasks without opening Excel.

Can openpyxl edit existing workbooks?

Yes. You can load an existing workbook with load_workbook, modify cells, add sheets, update formulas, and save the changes back to disk.

Yes, openpyxl can open existing files, let you edit them, and save updates.

Does openpyxl support charts?

openpyxl can create basic charts using the openpyxl.chart module and attach them to worksheets, though advanced chart customizations are best done in Excel or BI tools.

It can create basic charts, but for advanced visuals you might rely on Excel itself.

Is openpyxl cross-platform?

Yes. openpyxl runs on Windows, macOS, and Linux as a pure-Python library, provided Python and dependencies are installed.

Yes, it works across major platforms as long as Python is available.

What are common pitfalls?

Common issues include mismanaging file paths, mixing read/write modes, and assuming in-memory operations for large files. Use read_only or write_only modes for big datasets.

Watch out for paths and memory usage when working with large files.

Do formulas update automatically?

Formulas stored in Excel updates when the workbook is opened in Excel. When opened by other apps, you may need to recalculate.

Excel recalculates formulas when opened; openpyxl stores formula strings.

The Essentials

  • Install openpyxl and verify version
  • Create workbooks and write data with append
  • Read data efficiently with iter_rows
  • Apply styling to improve readability
  • Use formulas and charts for insights

Related Articles