import matplotlib.pyplot as plt
import numpy as np
# Sample data
= np.linspace(0, 10, 100)
x = np.sin(x)
y
# Create figure and axis
= plt.subplots(figsize=(8, 4))
fig, ax
# Plot data
='Sine Wave')
ax.plot(x, y, label
# Add grid, legend, title and labels
True)
ax.grid('X-axis')
ax.set_xlabel('Y-axis')
ax.set_ylabel('Simple Sine Wave Plot')
ax.set_title(
ax.legend()
plt.tight_layout() plt.show()
Python Data Visualization: Matplotlib vs Seaborn vs Altair
This guide compares three popular Python data visualization libraries: Matplotlib, Seaborn, and Altair (Vega-Altair). Each library has its own strengths, weaknesses, and ideal use cases. This comparison will help you choose the right tool for your specific visualization needs.
Quick Reference Comparison
Feature | Matplotlib | Seaborn | Altair |
---|---|---|---|
Release Year | 2003 | 2013 | 2016 |
Foundation | Standalone | Built on Matplotlib | Based on Vega-Lite |
Philosophy | Imperative | Statistical | Declarative |
Abstraction Level | Low | Medium | High |
Learning Curve | Steep | Moderate | Gentle |
Code Verbosity | High | Medium | Low |
Customization | Extensive | Good | Limited |
Statistical Integration | Manual | Built-in | Good |
Interactive Features | Limited | Limited | Excellent |
Performance with Large Data | Good | Moderate | Limited |
Community & Resources | Extensive | Good | Growing |
Matplotlib
Matplotlib is the foundational plotting library in Python’s data visualization ecosystem.
Strengths:
- Fine-grained control: Almost every aspect of a visualization can be customized
- Versatility: Can create virtually any type of static plot
- Maturity: Extensive documentation and community support
- Ecosystem integration: Many libraries integrate with or build upon Matplotlib
- Performance: Handles large datasets well
Weaknesses:
- Verbose syntax: Requires many lines of code for complex visualizations
- Steep learning curve: Many functions and parameters to learn
- Default aesthetics: Basic default styling (though this has improved)
- Limited interactivity: Primarily designed for static plots
Example Code:
When to use Matplotlib:
- You need complete control over every aspect of your visualization
- You’re creating complex, publication-quality figures
- You’re working with specialized plot types not available in higher-level libraries
- You need to integrate with many other Python libraries
- You’re working with large datasets
Seaborn
Seaborn is a statistical visualization library built on top of Matplotlib.
Strengths:
- Aesthetic defaults: Beautiful out-of-the-box styling
- Statistical integration: Built-in support for statistical visualizations
- Dataset awareness: Works well with pandas DataFrames
- Simplicity: Fewer lines of code than Matplotlib for common plots
- High-level functions: Specialized plots like
lmplot
,catplot
, etc.
Weaknesses:
- Limited customization: Some advanced customizations require falling back to Matplotlib
- Performance: Can be slower with very large datasets
- Restricted scope: Focused on statistical visualization, not general-purpose plotting
Example Code:
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
# Create sample data
= np.linspace(0, 10, 100)
x = np.sin(x) + np.random.normal(0, 0.2, size=len(x))
y = pd.DataFrame({'x': x, 'y': y})
data
# Set the aesthetic style
="whitegrid")
sns.set_theme(style
# Create the plot
=(8, 4))
plt.figure(figsize=data, x='x', y='y', label='Noisy Sine Wave')
sns.lineplot(data=data, x='x', y='y', scatter=False, label='Regression Line')
sns.regplot(data
# Add title and labels
'Seaborn Line Plot with Regression')
plt.title('X-axis')
plt.xlabel('Y-axis')
plt.ylabel(
plt.legend()
plt.tight_layout() plt.show()
When to use Seaborn:
- You want attractive visualizations with minimal code
- You’re performing statistical analysis
- You’re working with pandas DataFrames
- You’re creating common statistical plots (distributions, relationships, categorical plots)
- You want the power of Matplotlib with a simpler interface
Altair (Vega-Altair)
Altair is a declarative statistical visualization library based on Vega-Lite.
Strengths:
- Declarative approach: Focus on what to visualize, not how to draw it
- Concise syntax: Very readable, clear code
- Layered grammar of graphics: Intuitive composition of plots
- Interactive visualizations: Built-in support for interactive features
- JSON output: Visualizations can be saved as JSON specifications
Weaknesses:
- Performance limitations: Not ideal for very large datasets (>5000 points)
- Limited customization: Less fine-grained control than Matplotlib
- Learning curve: Different paradigm from traditional plotting libraries
- Browser dependency: Uses JavaScript rendering for advanced features
Example Code:
import altair as alt
import pandas as pd
import numpy as np
# Create sample data
= np.linspace(0, 10, 100)
x = np.sin(x) + np.random.normal(0, 0.2, size=len(x))
y = pd.DataFrame({'x': x, 'y': y})
data
# Create a simple scatter plot with interactive tooltips
= alt.Chart(data).mark_circle().encode(
chart ='x',
x='y',
y=['x', 'y']
tooltip
).properties(=600,
width=300,
height='Interactive Altair Scatter Plot'
title
).interactive()
# Add a regression line
= alt.Chart(data).transform_regression(
regression 'x', 'y'
='red').encode(
).mark_line(color='x',
x='y'
y
)
# Combine the plots
= chart + regression
final_chart
# Display the chart
final_chart
When to use Altair:
- You want interactive visualizations
- You prefer a declarative approach to visualization
- You’re working with small to medium-sized datasets
- You want to publish visualizations on the web
- You appreciate a consistent grammar of graphics
Common Visualization Types Comparison
Scatter Plot
Matplotlib:
import matplotlib.pyplot as plt
import numpy as np
= np.random.randn(100)
x = np.random.randn(100)
y
=(8, 6))
plt.figure(figsize=0.7)
plt.scatter(x, y, alpha'Matplotlib Scatter Plot')
plt.title('X-axis')
plt.xlabel('Y-axis')
plt.ylabel(True)
plt.grid( plt.show()
Seaborn:
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
= pd.DataFrame({
data 'x': np.random.randn(100),
'y': np.random.randn(100)
})
="whitegrid")
sns.set_theme(style=(8, 6))
plt.figure(figsize=data, x='x', y='y', alpha=0.7)
sns.scatterplot(data'Seaborn Scatter Plot')
plt.title( plt.show()
Altair:
import altair as alt
import pandas as pd
import numpy as np
= pd.DataFrame({
data 'x': np.random.randn(100),
'y': np.random.randn(100)
})
=0.7).encode(
alt.Chart(data).mark_circle(opacity='x',
x='y'
y
).properties(=500,
width=400,
height='Altair Scatter Plot'
title )
Histogram
Matplotlib:
import matplotlib.pyplot as plt
import numpy as np
= np.random.randn(1000)
data
=(8, 6))
plt.figure(figsize=30, alpha=0.7, edgecolor='black')
plt.hist(data, bins'Matplotlib Histogram')
plt.title('Value')
plt.xlabel('Frequency')
plt.ylabel(True, alpha=0.3)
plt.grid( plt.show()
Seaborn:
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np
= np.random.randn(1000)
data
="whitegrid")
sns.set_theme(style=(8, 6))
plt.figure(figsize=data, bins=30, kde=True)
sns.histplot(data'Seaborn Histogram with KDE')
plt.title( plt.show()
Altair:
import altair as alt
import pandas as pd
import numpy as np
= pd.DataFrame({'value': np.random.randn(1000)})
data
alt.Chart(data).mark_bar().encode('value', bin=alt.Bin(maxbins=30)),
alt.X(='count()'
y
).properties(=500,
width=400,
height='Altair Histogram'
title )
Line Plot
Matplotlib:
import matplotlib.pyplot as plt
import numpy as np
= np.linspace(0, 10, 100)
x = np.sin(x)
y1 = np.cos(x)
y2
=(10, 6))
plt.figure(figsize='Sine')
plt.plot(x, y1, label='Cosine')
plt.plot(x, y2, label'Matplotlib Line Plot')
plt.title('X-axis')
plt.xlabel('Y-axis')
plt.ylabel(
plt.legend()True)
plt.grid( plt.show()
Seaborn:
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
= np.linspace(0, 10, 100)
x = pd.DataFrame({
data 'x': np.concatenate([x, x]),
'y': np.concatenate([np.sin(x), np.cos(x)]),
'function': ['Sine']*100 + ['Cosine']*100
})
="darkgrid")
sns.set_theme(style=(10, 6))
plt.figure(figsize=data, x='x', y='y', hue='function')
sns.lineplot(data'Seaborn Line Plot')
plt.title( plt.show()
Altair:
import altair as alt
import pandas as pd
import numpy as np
= np.linspace(0, 10, 100)
x = pd.DataFrame({
data 'x': np.concatenate([x, x]),
'y': np.concatenate([np.sin(x), np.cos(x)]),
'function': ['Sine']*100 + ['Cosine']*100
})
alt.Chart(data).mark_line().encode(='x',
x='y',
y='function'
color
).properties(=600,
width=400,
height='Altair Line Plot'
title )
Heatmap
Matplotlib:
import matplotlib.pyplot as plt
import numpy as np
= np.random.rand(10, 12)
data
=(10, 8))
plt.figure(figsize='viridis')
plt.imshow(data, cmap='Value')
plt.colorbar(label'Matplotlib Heatmap')
plt.title('X-axis')
plt.xlabel('Y-axis')
plt.ylabel( plt.show()
Seaborn:
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np
= np.random.rand(10, 12)
data
=(10, 8))
plt.figure(figsize=True, cmap='viridis', fmt='.2f')
sns.heatmap(data, annot'Seaborn Heatmap')
plt.title( plt.show()
Altair:
import altair as alt
import pandas as pd
import numpy as np
# Create sample data
= np.random.rand(10, 12)
data = pd.DataFrame(data)
df
# Reshape for Altair
= df.reset_index().melt(id_vars='index')
df_long = ['y', 'x', 'value']
df_long.columns
alt.Chart(df_long).mark_rect().encode(='x:O',
x='y:O',
y='value:Q'
color
).properties(=500,
width=400,
height='Altair Heatmap'
title )
Decision Framework for Choosing a Library
Choose Matplotlib when:
- You need complete control over every detail of your visualization
- You’re creating complex, custom plots
- Your visualizations will be included in scientific publications
- You’re working with very large datasets
- You need to create animations or specialized chart types
Choose Seaborn when:
- You want attractive plots with minimal code
- You’re performing statistical analysis
- You want to create common statistical plots quickly
- You need to visualize relationships between variables
- You want good-looking defaults but still need some customization
Choose Altair when:
- You want interactive visualizations
- You prefer a declarative approach to visualization
- You want concise, readable code
- You’re creating dashboards or web-based visualizations
- You’re working with small to medium-sized datasets
Integration Examples
Combining Seaborn with Matplotlib:
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
import pandas as pd
# Create sample data
42)
np.random.seed(= pd.DataFrame({
data 'x': np.random.normal(0, 1, 100),
'y': np.random.normal(0, 1, 100),
'category': np.random.choice(['A', 'B', 'C'], 100)
})
# Create a figure with Matplotlib
= plt.subplots(figsize=(10, 6))
fig, ax
# Use Seaborn for the main plot
=data, x='x', y='y', hue='category', ax=ax)
sns.scatterplot(data
# Add Matplotlib customizations
'Combining Matplotlib and Seaborn', fontsize=16)
ax.set_title(True, linestyle='--', alpha=0.7)
ax.grid('X Variable', fontsize=12)
ax.set_xlabel('Y Variable', fontsize=12)
ax.set_ylabel(
# Add annotations using Matplotlib
'Interesting Point', xy=(-1, 1), xytext=(-2, 1.5),
ax.annotate(=dict(facecolor='black', shrink=0.05))
arrowprops
plt.tight_layout() plt.show()
Using Altair with Pandas:
import altair as alt
import pandas as pd
import numpy as np
# Create sample data with pandas
42)
np.random.seed(= pd.DataFrame({
df 'date': pd.date_range('2023-01-01', periods=100),
'value': np.cumsum(np.random.randn(100)),
'category': np.random.choice(['Group A', 'Group B'], 100)
})
# Use pandas to prepare the data
'month'] = df['date'].dt.month
df[= df.groupby(['month', 'category'])['value'].mean().reset_index()
monthly_avg
# Create the Altair visualization
= alt.Chart(monthly_avg).mark_line(point=True).encode(
chart ='month:O',
x='value:Q',
y='category:N',
color=['month', 'value', 'category']
tooltip
).properties(=600,
width=400,
height='Monthly Averages by Category'
title
).interactive()
chart
Performance Comparison
For libraries like Matplotlib, Seaborn, and Altair, performance can vary widely depending on the size of your dataset and the complexity of your visualization. Here’s a general overview:
Small Datasets (< 1,000 points):
- All three libraries perform well
- Altair might have slightly more overhead due to its JSON specification generation
Medium Datasets (1,000 - 10,000 points):
- Matplotlib and Seaborn continue to perform well
- Altair starts to slow down but remains usable
Large Datasets (> 10,000 points):
- Matplotlib performs best for large static visualizations
- Seaborn becomes slower as it adds statistical computations
- Altair significantly slows down and may require data aggregation
Recommended Approaches for Large Data:
- Matplotlib: Use
plot()
instead ofscatter()
for line plots, or tryhexbin()
for density plots - Seaborn: Use
sample()
or aggregation methods before plotting - Altair: Use
transform_sample()
or pre-aggregate your data
Conclusion
The Python visualization ecosystem offers tools for every need, from low-level control to high-level abstraction:
- Matplotlib provides ultimate flexibility and control but requires more code and knowledge
- Seaborn offers a perfect middle ground with statistical integration and clean defaults
- Altair delivers a concise, declarative approach with built-in interactivity
Rather than picking just one library, consider becoming familiar with all three and selecting the right tool for each visualization task. Many data scientists use a combination of these libraries, leveraging the strengths of each one as needed.
For those just starting, Seaborn provides a gentle entry point with attractive results for common visualization needs. As your skills advance, you can incorporate Matplotlib for customization and Altair for interactive visualizations.