🐍 Complete Python Tab Tutorial
From Setup to Advanced Multi-Environment Joins & Analysis
📋 Table of Contents
- Initial Setup – Add Environments
- Basics – List Tables & Query Data
- Single Table Analysis
- Join Tables – Same Environment
- Join Tables – Cross Environment
- Visualize Results
- Save Results Back to Dataverse
- Complete Real-World Examples
STEP 1: Initial Setup – Add Your Environments
1.1 Add Environments
Before using Python tab, connect your environments:
1. Click ☰ menu → “🌐 Environments”
2. Click “Add New Environment”
3. Enter:
– Name: “Production” (your choice)
– URL: https://yourorg.crm.dynamics.com
4. Click “Connect”
5. Complete device code authentication
6. Click “Load All Tables”
7. Repeat for all environments:
– Production
– Development
– UAT
– Sandbox
– (as many as you need!)
✅ You can add UNLIMITED environments!
1.2 Verify Your Environments
After adding environments, verify they’re connected:
1. Go to Python tab (☰ menu → 🐍 Python)
2. Wait 30-60 seconds for first load (downloads Python)
3. Run this code to list your environments:
# List all connected environments
list_environments()
# Output will show:
# 📋 Available Environments:
# 1. Production
# 2. Development
# 3. UAT
# 4. Sandbox
STEP 2: Basics – Query Your First Table
2.1 Simple Query – Get All Records
import pandas as pd
# Query all accounts from Production (up to 5000 records)
df = query_dataverse(“Production”, “account”)
# See what you got
print(f”Retrieved {len(df)} records”)
print(f”Columns: {len(df.columns)}”)
print(“\nFirst 5 rows:”)
print(df.head())
print(“\nColumn names:”)
print(df.columns.tolist())
What this does:
- Queries the “account” table from “Production” environment
- Returns up to 5000 records as a pandas DataFrame
- Shows you the data structure
2.2 Query Specific Columns
import pandas as pd
# Only get the columns you need
df = query_dataverse(“Production”, “account”,
select_columns=[“accountid”, “name”, “revenue”, “industrycode”])
print(df.head())
print(f”\nShape: {df.shape}”) # (rows, columns)
Why do this: Faster queries, less data transferred
2.3 Query with Filters
import pandas as pd
# Get only high-value accounts
df = query_dataverse(“Production”, “account”,
select_columns=[“name”, “revenue”, “industrycode”],
filter_query=”revenue gt 1000000″,
top=100)
print(f”High-value accounts: {len(df)}”)
print(df.head())
Filter Examples:
| Filter | OData Query |
|---|---|
| Revenue > $1M | revenue gt 1000000 |
| Name starts with “A” | startswith(name, 'A') |
| Active accounts | statecode eq 0 |
| Not null revenue | revenue ne null |
| Multiple conditions | revenue gt 1000000 and statecode eq 0 |
STEP 3: Single Table Analysis
Example 1: Revenue Analysis by Industry
import pandas as pd
import matplotlib.pyplot as plt
# Get account data
df = query_dataverse(“Production”, “account”,
select_columns=[“name”, “revenue”, “industrycode”],
filter_query=”revenue ne null and industrycode ne null”)
print(f”Analyzing {len(df)} accounts”)
# Group by industry
industry_stats = df.groupby(‘industrycode’).agg({
‘revenue’: [‘sum’, ‘mean’, ‘count’]
}).reset_index()
industry_stats.columns = [‘industry’, ‘total_revenue’, ‘avg_revenue’, ‘count’]
# Sort by total revenue
industry_stats = industry_stats.sort_values(‘total_revenue’, ascending=False)
print(“\nTop 10 Industries:”)
print(industry_stats.head(10))
# Visualize
plt.figure(figsize=(12, 6))
top10 = industry_stats.head(10)
plt.bar(range(len(top10)), top10[‘total_revenue’] / 1000000) # Convert to millions
plt.xticks(range(len(top10)), top10[‘industry’], rotation=45, ha=’right’)
plt.title(‘Top 10 Industries by Revenue’, fontsize=16, fontweight=’bold’)
plt.xlabel(‘Industry’)
plt.ylabel(‘Total Revenue ($ Millions)’)
plt.grid(axis=’y’, alpha=0.3)
plt.tight_layout()
plt.show()
# Save to Dataverse
result = ingest_to_dataverse(industry_stats, “Production”, “industry_revenue_analysis”)
print(f”\n✅ Analysis saved to {result[‘table’]}”)
Example 2: Find Top Customers
import pandas as pd
# Get accounts
df = query_dataverse(“Production”, “account”,
select_columns=[“accountid”, “name”, “revenue”, “createdon”],
filter_query=”revenue ne null”)
# Sort by revenue
df_sorted = df.sort_values(‘revenue’, ascending=False)
# Top 20
top20 = df_sorted.head(20)
print(“Top 20 Customers by Revenue:”)
print(top20[[‘name’, ‘revenue’]])
# Calculate statistics
print(f”\nTotal Revenue (Top 20): ${top20[‘revenue’].sum():,.2f}”)
print(f”Average Revenue (Top 20): ${top20[‘revenue’].mean():,.2f}”)
print(f”Median Revenue (Top 20): ${top20[‘revenue’].median():,.2f}”)
# Save
ingest_to_dataverse(top20, “Production”, “top20_customers”)
STEP 4: Join Tables – Same Environment
⚡ Important Concept: Joining in pandas
After querying tables from Dataverse, you use pandas merge() to join them, NOT SQL JOINs.
This gives you full pandas power for complex analysis!
Example 3: Join Accounts + Contacts
import pandas as pd
# Query accounts
accounts = query_dataverse(“Production”, “account”,
select_columns=[“accountid”, “name”, “revenue”, “industrycode”])
# Query contacts
contacts = query_dataverse(“Production”, “contact”,
select_columns=[“contactid”, “fullname”, “emailaddress1”, “parentcustomerid”])
print(f”Accounts: {len(accounts)}”)
print(f”Contacts: {len(contacts)}”)
# Join them (pandas merge)
# parentcustomerid in contacts = accountid in accounts
merged = pd.merge(
accounts,
contacts,
left_on=’accountid’,
right_on=’parentcustomerid’,
how=’inner’ # Only keep matches
)
print(f”\nMerged: {len(merged)} rows”)
print(“\nSample:”)
print(merged[[‘name’, ‘fullname’, ’emailaddress1′, ‘revenue’]].head())
# Analysis: Contacts per account
contacts_per_account = merged.groupby(‘name’).size().reset_index(name=’contact_count’)
contacts_per_account = contacts_per_account.sort_values(‘contact_count’, ascending=False)
print(“\nTop 10 Accounts by Contact Count:”)
print(contacts_per_account.head(10))
# Visualize
import matplotlib.pyplot as plt
plt.figure(figsize=(12, 6))
top10 = contacts_per_account.head(10)
plt.barh(range(len(top10)), top10[‘contact_count’])
plt.yticks(range(len(top10)), top10[‘name’])
plt.xlabel(‘Number of Contacts’)
plt.title(‘Top 10 Accounts by Contact Count’)
plt.tight_layout()
plt.show()
# Save results
ingest_to_dataverse(contacts_per_account, “Production”, “contacts_per_account_analysis”)
Example 4: Three-Way Join (Accounts + Contacts + Orders)
import pandas as pd
# Query all three tables
accounts = query_dataverse(“Production”, “account”,
select_columns=[“accountid”, “name”, “industrycode”])
contacts = query_dataverse(“Production”, “contact”,
select_columns=[“contactid”, “fullname”, “parentcustomerid”])
orders = query_dataverse(“Production”, “salesorder”,
select_columns=[“salesorderid”, “customerid”, “totalamount”, “createdon”],
top=10000)
# First join: Accounts + Contacts
acc_con = pd.merge(accounts, contacts,
left_on=’accountid’, right_on=’parentcustomerid’,
how=’inner’)
# Second join: Add Orders
full = pd.merge(acc_con, orders,
left_on=’accountid’, right_on=’customerid’,
how=’inner’)
print(f”Complete dataset: {len(full)} rows”)
# Analysis: Revenue by industry
industry_revenue = full.groupby(‘industrycode’)[‘totalamount’].sum().reset_index()
industry_revenue.columns = [‘industry’, ‘total_order_value’]
industry_revenue = industry_revenue.sort_values(‘total_order_value’, ascending=False)
print(“\nRevenue by Industry (from orders):”)
print(industry_revenue.head(10))
# Save
ingest_to_dataverse(industry_revenue, “Production”, “industry_order_revenue”)
STEP 5: Join Tables – Cross Environment
Why Cross-Environment Joins?
- Compare Dev vs Prod data
- Migrate data with validation
- Consolidate data from multiple orgs
- Build unified reports across environments
Example 5: Compare Production vs Development
import pandas as pd
# Query Production accounts
prod_accounts = query_dataverse(“Production”, “account”,
select_columns=[“name”, “revenue”, “accountid”])
# Query Development accounts
dev_accounts = query_dataverse(“Development”, “account”,
select_columns=[“name”, “revenue”, “accountid”])
print(f”Production: {len(prod_accounts)} accounts”)
print(f”Development: {len(dev_accounts)} accounts”)
# Compare by name
comparison = pd.merge(
prod_accounts,
dev_accounts,
on=’name’,
how=’outer’, # Keep all records from both
suffixes=(‘_prod’, ‘_dev’),
indicator=True # Shows which side each record came from
)
# Analyze differences
only_prod = comparison[comparison[‘_merge’] == ‘left_only’]
only_dev = comparison[comparison[‘_merge’] == ‘right_only’]
in_both = comparison[comparison[‘_merge’] == ‘both’]
print(f”\nOnly in Production: {len(only_prod)}”)
print(f”Only in Development: {len(only_dev)}”)
print(f”In both: {len(in_both)}”)
# Revenue comparison for accounts in both
if len(in_both) > 0:
in_both[‘revenue_diff’] = in_both[‘revenue_prod’] – in_both[‘revenue_dev’]
print(“\nRevenue differences (Prod – Dev):”)
print(in_both[[‘name’, ‘revenue_prod’, ‘revenue_dev’, ‘revenue_diff’]].head(10))
# Save comparison report
ingest_to_dataverse(comparison, “Production”, “prod_vs_dev_comparison”)
Example 6: Cross-Org Customer Consolidation
import pandas as pd
# Scenario: You have separate Dataverse orgs for different regions
# and want to consolidate customer data
# Query North America org
na_customers = query_dataverse(“North_America”, “account”,
select_columns=[“name”, “revenue”, “address1_city”, “address1_country”])
# Query Europe org
eu_customers = query_dataverse(“Europe”, “account”,
select_columns=[“name”, “revenue”, “address1_city”, “address1_country”])
# Query Asia org
asia_customers = query_dataverse(“Asia”, “account”,
select_columns=[“name”, “revenue”, “address1_city”, “address1_country”])
# Add region identifier
na_customers[‘region’] = ‘North America’
eu_customers[‘region’] = ‘Europe’
asia_customers[‘region’] = ‘Asia’
# Consolidate
global_customers = pd.concat([na_customers, eu_customers, asia_customers], ignore_index=True)
print(f”Total global customers: {len(global_customers)}”)
# Analysis by region
region_stats = global_customers.groupby(‘region’).agg({
‘name’: ‘count’,
‘revenue’: ‘sum’
}).reset_index()
region_stats.columns = [‘region’, ‘customer_count’, ‘total_revenue’]
print(“\nGlobal Summary:”)
print(region_stats)
# Visualize
import matplotlib.pyplot as plt
fig, axs = plt.subplots(1, 2, figsize=(14, 6))
# Customer count by region
axs[0].bar(region_stats[‘region’], region_stats[‘customer_count’])
axs[0].set_title(‘Customers by Region’)
axs[0].set_ylabel(‘Count’)
# Revenue by region
axs[1].bar(region_stats[‘region’], region_stats[‘total_revenue’] / 1000000)
axs[1].set_title(‘Revenue by Region’)
axs[1].set_ylabel(‘Revenue ($ Millions)’)
plt.tight_layout()
plt.show()
# Save consolidated data to a central environment
ingest_to_dataverse(global_customers, “Global_HQ”, “consolidated_customers”)
STEP 6: Visualize Your Results
Example 7: Multiple Visualizations
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
# Query data
df = query_dataverse(“Production”, “account”,
select_columns=[“name”, “revenue”, “industrycode”, “createdon”],
filter_query=”revenue ne null”)
# Convert createdon to datetime
df[‘createdon’] = pd.to_datetime(df[‘createdon’])
df[‘year’] = df[‘createdon’].dt.year
# Create multiple plots
fig, axs = plt.subplots(2, 2, figsize=(16, 12))
# Plot 1: Revenue distribution
axs[0, 0].hist(df[‘revenue’] / 1000, bins=50, edgecolor=’black’, alpha=0.7)
axs[0, 0].set_title(‘Revenue Distribution’, fontsize=14, fontweight=’bold’)
axs[0, 0].set_xlabel(‘Revenue ($1000s)’)
axs[0, 0].set_ylabel(‘Number of Accounts’)
axs[0, 0].grid(alpha=0.3)
# Plot 2: Top 10 industries
industry_counts = df[‘industrycode’].value_counts().head(10)
axs[0, 1].barh(range(len(industry_counts)), industry_counts.values)
axs[0, 1].set_yticks(range(len(industry_counts)))
axs[0, 1].set_yticklabels(industry_counts.index)
axs[0, 1].set_title(‘Top 10 Industries by Account Count’, fontsize=14, fontweight=’bold’)
axs[0, 1].set_xlabel(‘Number of Accounts’)
# Plot 3: Revenue by year
yearly_revenue = df.groupby(‘year’)[‘revenue’].sum() / 1000000
axs[1, 0].plot(yearly_revenue.index, yearly_revenue.values, marker=’o’, linewidth=2, markersize=8)
axs[1, 0].set_title(‘Revenue Trend by Year’, fontsize=14, fontweight=’bold’)
axs[1, 0].set_xlabel(‘Year’)
axs[1, 0].set_ylabel(‘Total Revenue ($ Millions)’)
axs[1, 0].grid(alpha=0.3)
# Plot 4: Industry revenue pie chart
top_industries = df.groupby(‘industrycode’)[‘revenue’].sum().nlargest(5)
axs[1, 1].pie(top_industries.values, labels=top_industries.index, autopct=’%1.1f%%’)
axs[1, 1].set_title(‘Revenue Share – Top 5 Industries’, fontsize=14, fontweight=’bold’)
plt.tight_layout()
plt.show()
print(“✅ Visualizations created!”)
STEP 7: Save Results Back to Dataverse
Why Save Results?
- Share analysis with your team
- Use results in Power BI reports
- Create audit trails
- Build automated reporting pipelines
- Historical tracking of analysis results
Example 8: Complete Analysis Pipeline with Save
import pandas as pd
import matplotlib.pyplot as plt
from datetime import datetime
# Get data
df = query_dataverse(“Production”, “account”,
select_columns=[“accountid”, “name”, “revenue”, “industrycode”])
# Perform analysis
summary = df.groupby(‘industrycode’).agg({
‘accountid’: ‘count’,
‘revenue’: [‘sum’, ‘mean’, ‘min’, ‘max’]
}).reset_index()
summary.columns = [‘industry’, ‘account_count’, ‘total_revenue’, ‘avg_revenue’, ‘min_revenue’, ‘max_revenue’]
# Add metadata
summary[‘analysis_date’] = datetime.now().strftime(‘%Y-%m-%d %H:%M:%S’)
summary[‘data_source’] = ‘Production’
# Sort by total revenue
summary = summary.sort_values(‘total_revenue’, ascending=False)
print(“Analysis Summary:”)
print(summary.head(10))
# Visualize
plt.figure(figsize=(12, 6))
plt.bar(summary[‘industry’].head(10), summary[‘total_revenue’].head(10) / 1000000)
plt.xticks(rotation=45, ha=’right’)
plt.title(‘Top 10 Industries by Revenue’)
plt.ylabel(‘Revenue ($ Millions)’)
plt.tight_layout()
plt.show()
# Save to staging table in Production
result = ingest_to_dataverse(summary, “Production”, “industry_analysis_results”)
print(f”\n✅ Analysis saved!”)
print(f” Table: {result[‘table’]}”)
print(f” Records: {result[‘success’]}”)
print(f” Errors: {result[‘errors’]}”)
# You can also save to a different environment
result2 = ingest_to_dataverse(summary, “Development”, “industry_analysis_results”)
print(f”\n✅ Also saved to Development!”)
STEP 8: Complete Real-World Examples
🎯 Complete Example 1: Customer Segmentation (RFM Analysis)
import pandas as pd
import numpy as np
from datetime import datetime
import matplotlib.pyplot as plt
# STEP 1: Get customer data from Production
customers = query_dataverse(“Production”, “account”,
select_columns=[“accountid”, “name”, “createdon”])
# STEP 2: Get order data
orders = query_dataverse(“Production”, “salesorder”,
select_columns=[“salesorderid”, “customerid”, “totalamount”, “createdon”],
filter_query=”statecode eq 0″,
top=50000)
print(f”Customers: {len(customers)}”)
print(f”Orders: {len(orders)}”)
# STEP 3: Calculate RFM metrics
current_date = pd.Timestamp.now()
orders[‘createdon’] = pd.to_datetime(orders[‘createdon’])
rfm = orders.groupby(‘customerid’).agg({
‘createdon’: lambda x: (current_date – x.max()).days, # Recency
‘salesorderid’: ‘count’, # Frequency
‘totalamount’: ‘sum’ # Monetary
}).reset_index()
rfm.columns = [‘customerid’, ‘recency_days’, ‘order_count’, ‘total_spent’]
# STEP 4: Score each metric (1-5)
rfm[‘recency_score’] = pd.qcut(rfm[‘recency_days’], 5, labels=[5,4,3,2,1])
rfm[‘frequency_score’] = pd.qcut(rfm[‘order_count’].rank(method=’first’), 5, labels=[1,2,3,4,5])
rfm[‘monetary_score’] = pd.qcut(rfm[‘total_spent’], 5, labels=[1,2,3,4,5])
# Total RFM score
rfm[‘rfm_score’] = (rfm[‘recency_score’].astype(int) +
rfm[‘frequency_score’].astype(int) +
rfm[‘monetary_score’].astype(int))
# STEP 5: Segment customers
def segment_customer(score):
if score >= 13:
return ‘Champions’
elif score >= 10:
return ‘Loyal’
elif score >= 7:
return ‘Potential’
else:
return ‘At Risk’
rfm[‘segment’] = rfm[‘rfm_score’].apply(segment_customer)
# STEP 6: Join with customer names
rfm_with_names = pd.merge(rfm, customers, left_on=’customerid’, right_on=’accountid’, how=’left’)
# STEP 7: Segment summary
segment_summary = rfm.groupby(‘segment’).agg({
‘customerid’: ‘count’,
‘total_spent’: ‘sum’,
‘order_count’: ‘sum’
}).reset_index()
segment_summary.columns = [‘segment’, ‘customer_count’, ‘total_revenue’, ‘total_orders’]
print(“\n📊 Customer Segments:”)
print(segment_summary)
# STEP 8: Visualize
fig, axs = plt.subplots(1, 3, figsize=(18, 5))
# Customers by segment
axs[0].bar(segment_summary[‘segment’], segment_summary[‘customer_count’])
axs[0].set_title(‘Customers by Segment’)
axs[0].set_ylabel(‘Count’)
axs[0].tick_params(axis=’x’, rotation=45)
# Revenue by segment
axs[1].bar(segment_summary[‘segment’], segment_summary[‘total_revenue’] / 1000000)
axs[1].set_title(‘Revenue by Segment’)
axs[1].set_ylabel(‘Revenue ($ Millions)’)
axs[1].tick_params(axis=’x’, rotation=45)
# Orders by segment
axs[2].bar(segment_summary[‘segment’], segment_summary[‘total_orders’])
axs[2].set_title(‘Orders by Segment’)
axs[2].set_ylabel(‘Order Count’)
axs[2].tick_params(axis=’x’, rotation=45)
plt.tight_layout()
plt.show()
# STEP 9: Save results
result1 = ingest_to_dataverse(rfm_with_names, “Production”, “customer_rfm_scores”)
result2 = ingest_to_dataverse(segment_summary, “Production”, “customer_segments”)
print(f”\n✅ Analysis complete!”)
print(f” RFM Scores saved: {result1[‘table’]}”)
print(f” Segment Summary saved: {result2[‘table’]}”)
print(f”\n💡 Next steps:”)
print(” – Use these segments in marketing campaigns”)
print(” – Build Power BI reports from staging tables”)
print(” – Set up automated monthly RFM updates”)
🎯 Complete Example 2: Cross-Environment Data Migration
import pandas as pd
# SCENARIO: Migrate cleaned customer data from Dev to Prod
# STEP 1: Extract from Development
print(“📤 STEP 1: Extracting from Development…”)
dev_data = query_dataverse(“Development”, “account”,
select_columns=[“name”, “revenue”, “industrycode”,
“address1_city”, “emailaddress1″],
filter_query=”statecode eq 0″) # Active only
print(f” Extracted: {len(dev_data)} records”)
# STEP 2: Data Quality Checks
print(“\n🔍 STEP 2: Running quality checks…”)
initial_count = len(dev_data)
# Remove nulls
dev_data = dev_data.dropna(subset=[‘name’])
print(f” After null removal: {len(dev_data)} records”)
# Remove duplicates
dev_data = dev_data.drop_duplicates(subset=[‘name’])
print(f” After deduplication: {len(dev_data)} records”)
# Validate revenue
dev_data = dev_data[dev_data[‘revenue’] > 0]
print(f” After validation: {len(dev_data)} records”)
removed = initial_count – len(dev_data)
print(f” ❌ Removed {removed} invalid records”)
# STEP 3: Check if records exist in Production
print(“\n🔍 STEP 3: Checking for duplicates in Production…”)
prod_data = query_dataverse(“Production”, “account”,
select_columns=[“name”, “accountid”])
# Find new records only
dev_data[‘exists_in_prod’] = dev_data[‘name’].isin(prod_data[‘name’])
new_records = dev_data[~dev_data[‘exists_in_prod’]].copy()
new_records = new_records.drop(‘exists_in_prod’, axis=1)
print(f” Total in Dev: {len(dev_data)}”)
print(f” Already in Prod: {len(dev_data) – len(new_records)}”)
print(f” New to migrate: {len(new_records)}”)
# STEP 4: Transform data (if needed)
print(“\n🔄 STEP 4: Transforming data…”)
new_records[‘name’] = new_records[‘name’].str.upper()
new_records[‘migration_date’] = pd.Timestamp.now().strftime(‘%Y-%m-%d’)
new_records[‘data_source’] = ‘Development’
# STEP 5: Ingest to Production
print(“\n📥 STEP 5: Ingesting to Production…”)
result = ingest_to_dataverse(new_records, “Production”, “migrated_accounts”)
# STEP 6: Verification
print(“\n✅ MIGRATION COMPLETE!”)
print(f” Target table: {result[‘table’]}”)
print(f” Successfully migrated: {result[‘success’]} records”)
print(f” Errors: {result[‘errors’]} records”)
print(f”\n📊 Summary:”)
print(f” Started with: {initial_count} records (Dev)”)
print(f” After cleaning: {len(dev_data)} records”)
print(f” New records: {len(new_records)}”)
print(f” Migrated: {result[‘success’]}”)
print(f”\n💡 Next: Review staging table in Production and move to final table”)
🎯 Complete Example 3: Multi-Environment Consolidated Report
import pandas as pd
import matplotlib.pyplot as plt
# SCENARIO: Create consolidated revenue report from all environments
print(“📊 BUILDING MULTI-ENVIRONMENT REVENUE REPORT”)
print(“=” * 60)
# STEP 1: Query each environment
environments = [“Production”, “UAT”, “Development”]
all_data = []
for env in environments:
print(f”\n📍 Querying {env}…”)
try:
df = query_dataverse(env, “account”,
select_columns=[“name”, “revenue”, “industrycode”],
filter_query=”revenue ne null”)
df[‘environment’] = env
all_data.append(df)
print(f” ✅ {len(df)} records”)
except Exception as e:
print(f” ❌ Error: {e}”)
# STEP 2: Consolidate
consolidated = pd.concat(all_data, ignore_index=True)
print(f”\n📦 Total consolidated: {len(consolidated)} records”)
# STEP 3: Analysis by environment
env_stats = consolidated.groupby(‘environment’).agg({
‘revenue’: [‘sum’, ‘mean’, ‘count’]
}).reset_index()
env_stats.columns = [‘environment’, ‘total_revenue’, ‘avg_revenue’, ‘account_count’]
print(“\n📊 Environment Summary:”)
print(env_stats)
# STEP 4: Industry analysis across all environments
industry_stats = consolidated.groupby(‘industrycode’).agg({
‘revenue’: ‘sum’,
‘name’: ‘count’
}).reset_index()
industry_stats.columns = [‘industry’, ‘total_revenue’, ‘account_count’]
industry_stats = industry_stats.sort_values(‘total_revenue’, ascending=False)
print(“\n📊 Top 10 Industries (All Environments):”)
print(industry_stats.head(10))
# STEP 5: Cross-environment comparison by industry
industry_by_env = consolidated.pivot_table(
index=’industrycode’,
columns=’environment’,
values=’revenue’,
aggfunc=’sum’,
fill_value=0
).reset_index()
print(“\n📊 Revenue by Industry and Environment:”)
print(industry_by_env.head(10))
# STEP 6: Visualize
fig, axs = plt.subplots(2, 2, figsize=(16, 12))
# Chart 1: Revenue by environment
axs[0, 0].bar(env_stats[‘environment’], env_stats[‘total_revenue’] / 1000000)
axs[0, 0].set_title(‘Total Revenue by Environment’)
axs[0, 0].set_ylabel(‘Revenue ($ Millions)’)
axs[0, 0].tick_params(axis=’x’, rotation=45)
# Chart 2: Account count by environment
axs[0, 1].bar(env_stats[‘environment’], env_stats[‘account_count’])
axs[0, 1].set_title(‘Account Count by Environment’)
axs[0, 1].set_ylabel(‘Number of Accounts’)
axs[0, 1].tick_params(axis=’x’, rotation=45)
# Chart 3: Top 10 industries
top10_ind = industry_stats.head(10)
axs[1, 0].barh(range(len(top10_ind)), top10_ind[‘total_revenue’] / 1000000)
axs[1, 0].set_yticks(range(len(top10_ind)))
axs[1, 0].set_yticklabels(top10_ind[‘industry’])
axs[1, 0].set_xlabel(‘Revenue ($ Millions)’)
axs[1, 0].set_title(‘Top 10 Industries (All Environments)’)
# Chart 4: Stacked bar – Revenue by environment and industry (top 5)
top5_ind = industry_stats.head(5)[‘industry’]
env_ind_subset = industry_by_env[industry_by_env[‘industrycode’].isin(top5_ind)]
x = range(len(environments))
width = 0.15
for i, industry in enumerate(top5_ind):
row = env_ind_subset[env_ind_subset[‘industrycode’] == industry]
values = [row[env].values[0] / 1000000 if env in row.columns else 0 for env in environments]
axs[1, 1].bar([p + width * i for p in x], values, width, label=industry)
axs[1, 1].set_xlabel(‘Environment’)
axs[1, 1].set_ylabel(‘Revenue ($ Millions)’)
axs[1, 1].set_title(‘Top 5 Industries by Environment’)
axs[1, 1].set_xticks([p + width * 2 for p in x])
axs[1, 1].set_xticklabels(environments)
axs[1, 1].legend()
plt.tight_layout()
plt.show()
# STEP 7: Save consolidated report
print(“\n💾 Saving results…”)
# Save environment summary
result1 = ingest_to_dataverse(env_stats, “Production”, “multi_env_summary”)
# Save industry analysis
result2 = ingest_to_dataverse(industry_stats, “Production”, “multi_env_industry_analysis”)
# Save detailed data
result3 = ingest_to_dataverse(consolidated, “Production”, “multi_env_consolidated_data”)
print(f”\n✅ REPORT COMPLETE!”)
print(f” Environment summary: {result1[‘table’]}”)
print(f” Industry analysis: {result2[‘table’]}”)
print(f” Consolidated data: {result3[‘table’]}”)
print(f”\n📊 Total records saved: {result1[‘success’] + result2[‘success’] + result3[‘success’]}”)
🎓 Quick Reference
Essential Functions
| Function | Purpose | Example |
|---|---|---|
list_environments() |
See connected environments | list_environments() |
query_dataverse() |
Get data from table | df = query_dataverse("Prod", "account") |
ingest_to_dataverse() |
Save results back | ingest_to_dataverse(df, "Prod", "results") |
pd.merge() |
Join DataFrames | pd.merge(df1, df2, on='key') |
pd.concat() |
Stack DataFrames | pd.concat([df1, df2]) |
plt.show() |
Display plot | plt.bar(x, y); plt.show() |
Common pandas Operations
| Task | Code |
|---|---|
| Group by column | df.groupby('column').sum() |
| Filter rows | df[df['revenue'] > 1000000] |
| Sort | df.sort_values('revenue', ascending=False) |
| Top N | df.nlargest(10, 'revenue') |
| Count unique | df['column'].nunique() |
| Remove nulls | df.dropna() |
| Remove duplicates | df.drop_duplicates() |
Join Types
| Join Type | Description | Code |
|---|---|---|
| Inner Join | Only matching records | pd.merge(df1, df2, on='key', how='inner') |
| Left Join | All from left, matching from right | pd.merge(df1, df2, on='key', how='left') |
| Right Join | All from right, matching from left | pd.merge(df1, df2, on='key', how='right') |
| Outer Join | All records from both | pd.merge(df1, df2, on='key', how='outer') |
💡 Pro Tips
Performance Tips
- ✅ Use
select_columnsto only get what you need - ✅ Use
filter_queryto reduce data at source - ✅ Use
topparameter for large tables - ✅ Query during off-peak hours for large datasets
- ✅ Save intermediate results to staging tables
Best Practices
- ✅ Always check data quality before joining
- ✅ Use
.head()to preview before full processing - ✅ Add metadata (date, source) to analysis results
- ✅ Test joins on small subsets first
- ✅ Document your analysis in comments
- ✅ Save results with descriptive table names
Troubleshooting
- ❌ “Environment not found” → Check name matches exactly (case-sensitive)
- ❌ “No records returned” → Check filter_query syntax
- ❌ “Join produces no results” → Check key column names match
- ❌ “Too many records” → Use
topparameter - ❌ “Memory error” → Process in smaller batches
🎉 Summary
You Now Know How To:
- ✅ Connect unlimited environments
- ✅ Query tables from any environment
- ✅ Filter and select specific data
- ✅ Analyze single tables with pandas
- ✅ Join tables from same environment
- ✅ Join tables across different environments
- ✅ Consolidate data from multiple orgs
- ✅ Create visualizations with matplotlib
- ✅ Save results back to Dataverse
- ✅ Build complete ETL pipelines
You have the power to analyze ANY data from ANY environment! 🚀
