🐍 Complete Python Tab Tutorial – Query, Analyze & Join Tables

🐍 Complete Python Tab Tutorial

From Setup to Advanced Multi-Environment Joins & Analysis

📋 Table of Contents

Initial Setup – Add Environments
Basics – List Tables & Query Data
Single Table Analysis
Join Tables – Same Environment
Join Tables – Cross Environment
Visualize Results
Save Results Back to Dataverse
Complete Real-World Examples

STEP 1: Initial Setup – Add Your Environments

1.1 Add Environments

Before using Python tab, connect your environments:

1. Click ☰ menu → “🌐 Environments” 2. Click “Add New Environment” 3. Enter: – Name: “Production” (your choice) – URL: https://yourorg.crm.dynamics.com 4. Click “Connect” 5. Complete device code authentication 6. Click “Load All Tables” 7. Repeat for all environments: – Production – Development – UAT – Sandbox – (as many as you need!)

✅ You can add UNLIMITED environments!

1.2 Verify Your Environments

After adding environments, verify they’re connected:

1. Go to Python tab (☰ menu → 🐍 Python) 2. Wait 30-60 seconds for first load (downloads Python) 3. Run this code to list your environments:

# List all connected environments list_environments() # Output will show: # 📋 Available Environments: # 1. Production # 2. Development # 3. UAT # 4. Sandbox

STEP 2: Basics – Query Your First Table

2.1 Simple Query – Get All Records

import pandas as pd # Query all accounts from Production (up to 5000 records) df = query_dataverse(“Production”, “account”) # See what you got print(f”Retrieved {len(df)} records”) print(f”Columns: {len(df.columns)}”) print(“\nFirst 5 rows:”) print(df.head()) print(“\nColumn names:”) print(df.columns.tolist())

What this does:

Queries the “account” table from “Production” environment
Returns up to 5000 records as a pandas DataFrame
Shows you the data structure

2.2 Query Specific Columns

import pandas as pd # Only get the columns you need df = query_dataverse(“Production”, “account”, select_columns=[“accountid”, “name”, “revenue”, “industrycode”]) print(df.head()) print(f”\nShape: {df.shape}”) # (rows, columns)

Why do this: Faster queries, less data transferred

2.3 Query with Filters

import pandas as pd # Get only high-value accounts df = query_dataverse(“Production”, “account”, select_columns=[“name”, “revenue”, “industrycode”], filter_query=”revenue gt 1000000″, top=100) print(f”High-value accounts: {len(df)}”) print(df.head())

Filter Examples:

Filter	OData Query
Revenue > $1M	`revenue gt 1000000`
Name starts with “A”	`startswith(name, 'A')`
Active accounts	`statecode eq 0`
Not null revenue	`revenue ne null`
Multiple conditions	`revenue gt 1000000 and statecode eq 0`

STEP 3: Single Table Analysis

Example 1: Revenue Analysis by Industry

import pandas as pd import matplotlib.pyplot as plt # Get account data df = query_dataverse(“Production”, “account”, select_columns=[“name”, “revenue”, “industrycode”], filter_query=”revenue ne null and industrycode ne null”) print(f”Analyzing {len(df)} accounts”) # Group by industry industry_stats = df.groupby(‘industrycode’).agg({ ‘revenue’: [‘sum’, ‘mean’, ‘count’] }).reset_index() industry_stats.columns = [‘industry’, ‘total_revenue’, ‘avg_revenue’, ‘count’] # Sort by total revenue industry_stats = industry_stats.sort_values(‘total_revenue’, ascending=False) print(“\nTop 10 Industries:”) print(industry_stats.head(10)) # Visualize plt.figure(figsize=(12, 6)) top10 = industry_stats.head(10) plt.bar(range(len(top10)), top10[‘total_revenue’] / 1000000) # Convert to millions plt.xticks(range(len(top10)), top10[‘industry’], rotation=45, ha=’right’) plt.title(‘Top 10 Industries by Revenue’, fontsize=16, fontweight=’bold’) plt.xlabel(‘Industry’) plt.ylabel(‘Total Revenue ($ Millions)’) plt.grid(axis=’y’, alpha=0.3) plt.tight_layout() plt.show() # Save to Dataverse result = ingest_to_dataverse(industry_stats, “Production”, “industry_revenue_analysis”) print(f”\n✅ Analysis saved to {result[‘table’]}”)

Example 2: Find Top Customers

import pandas as pd # Get accounts df = query_dataverse(“Production”, “account”, select_columns=[“accountid”, “name”, “revenue”, “createdon”], filter_query=”revenue ne null”) # Sort by revenue df_sorted = df.sort_values(‘revenue’, ascending=False) # Top 20 top20 = df_sorted.head(20) print(“Top 20 Customers by Revenue:”) print(top20[[‘name’, ‘revenue’]]) # Calculate statistics print(f”\nTotal Revenue (Top 20): ${top20[‘revenue’].sum():,.2f}”) print(f”Average Revenue (Top 20): ${top20[‘revenue’].mean():,.2f}”) print(f”Median Revenue (Top 20): ${top20[‘revenue’].median():,.2f}”) # Save ingest_to_dataverse(top20, “Production”, “top20_customers”)

STEP 4: Join Tables – Same Environment

⚡ Important Concept: Joining in pandas

After querying tables from Dataverse, you use pandas merge() to join them, NOT SQL JOINs.

This gives you full pandas power for complex analysis!

Example 3: Join Accounts + Contacts

import pandas as pd # Query accounts accounts = query_dataverse(“Production”, “account”, select_columns=[“accountid”, “name”, “revenue”, “industrycode”]) # Query contacts contacts = query_dataverse(“Production”, “contact”, select_columns=[“contactid”, “fullname”, “emailaddress1”, “parentcustomerid”]) print(f”Accounts: {len(accounts)}”) print(f”Contacts: {len(contacts)}”) # Join them (pandas merge) # parentcustomerid in contacts = accountid in accounts merged = pd.merge( accounts, contacts, left_on=’accountid’, right_on=’parentcustomerid’, how=’inner’ # Only keep matches ) print(f”\nMerged: {len(merged)} rows”) print(“\nSample:”) print(merged[[‘name’, ‘fullname’, ’emailaddress1′, ‘revenue’]].head()) # Analysis: Contacts per account contacts_per_account = merged.groupby(‘name’).size().reset_index(name=’contact_count’) contacts_per_account = contacts_per_account.sort_values(‘contact_count’, ascending=False) print(“\nTop 10 Accounts by Contact Count:”) print(contacts_per_account.head(10)) # Visualize import matplotlib.pyplot as plt plt.figure(figsize=(12, 6)) top10 = contacts_per_account.head(10) plt.barh(range(len(top10)), top10[‘contact_count’]) plt.yticks(range(len(top10)), top10[‘name’]) plt.xlabel(‘Number of Contacts’) plt.title(‘Top 10 Accounts by Contact Count’) plt.tight_layout() plt.show() # Save results ingest_to_dataverse(contacts_per_account, “Production”, “contacts_per_account_analysis”)

Example 4: Three-Way Join (Accounts + Contacts + Orders)

import pandas as pd # Query all three tables accounts = query_dataverse(“Production”, “account”, select_columns=[“accountid”, “name”, “industrycode”]) contacts = query_dataverse(“Production”, “contact”, select_columns=[“contactid”, “fullname”, “parentcustomerid”]) orders = query_dataverse(“Production”, “salesorder”, select_columns=[“salesorderid”, “customerid”, “totalamount”, “createdon”], top=10000) # First join: Accounts + Contacts acc_con = pd.merge(accounts, contacts, left_on=’accountid’, right_on=’parentcustomerid’, how=’inner’) # Second join: Add Orders full = pd.merge(acc_con, orders, left_on=’accountid’, right_on=’customerid’, how=’inner’) print(f”Complete dataset: {len(full)} rows”) # Analysis: Revenue by industry industry_revenue = full.groupby(‘industrycode’)[‘totalamount’].sum().reset_index() industry_revenue.columns = [‘industry’, ‘total_order_value’] industry_revenue = industry_revenue.sort_values(‘total_order_value’, ascending=False) print(“\nRevenue by Industry (from orders):”) print(industry_revenue.head(10)) # Save ingest_to_dataverse(industry_revenue, “Production”, “industry_order_revenue”)

STEP 5: Join Tables – Cross Environment

Why Cross-Environment Joins?

Compare Dev vs Prod data
Migrate data with validation
Consolidate data from multiple orgs
Build unified reports across environments

Example 5: Compare Production vs Development

import pandas as pd # Query Production accounts prod_accounts = query_dataverse(“Production”, “account”, select_columns=[“name”, “revenue”, “accountid”]) # Query Development accounts dev_accounts = query_dataverse(“Development”, “account”, select_columns=[“name”, “revenue”, “accountid”]) print(f”Production: {len(prod_accounts)} accounts”) print(f”Development: {len(dev_accounts)} accounts”) # Compare by name comparison = pd.merge( prod_accounts, dev_accounts, on=’name’, how=’outer’, # Keep all records from both suffixes=(‘_prod’, ‘_dev’), indicator=True # Shows which side each record came from ) # Analyze differences only_prod = comparison[comparison[‘_merge’] == ‘left_only’] only_dev = comparison[comparison[‘_merge’] == ‘right_only’] in_both = comparison[comparison[‘_merge’] == ‘both’] print(f”\nOnly in Production: {len(only_prod)}”) print(f”Only in Development: {len(only_dev)}”) print(f”In both: {len(in_both)}”) # Revenue comparison for accounts in both if len(in_both) > 0: in_both[‘revenue_diff’] = in_both[‘revenue_prod’] – in_both[‘revenue_dev’] print(“\nRevenue differences (Prod – Dev):”) print(in_both[[‘name’, ‘revenue_prod’, ‘revenue_dev’, ‘revenue_diff’]].head(10)) # Save comparison report ingest_to_dataverse(comparison, “Production”, “prod_vs_dev_comparison”)

Example 6: Cross-Org Customer Consolidation

import pandas as pd # Scenario: You have separate Dataverse orgs for different regions # and want to consolidate customer data # Query North America org na_customers = query_dataverse(“North_America”, “account”, select_columns=[“name”, “revenue”, “address1_city”, “address1_country”]) # Query Europe org eu_customers = query_dataverse(“Europe”, “account”, select_columns=[“name”, “revenue”, “address1_city”, “address1_country”]) # Query Asia org asia_customers = query_dataverse(“Asia”, “account”, select_columns=[“name”, “revenue”, “address1_city”, “address1_country”]) # Add region identifier na_customers[‘region’] = ‘North America’ eu_customers[‘region’] = ‘Europe’ asia_customers[‘region’] = ‘Asia’ # Consolidate global_customers = pd.concat([na_customers, eu_customers, asia_customers], ignore_index=True) print(f”Total global customers: {len(global_customers)}”) # Analysis by region region_stats = global_customers.groupby(‘region’).agg({ ‘name’: ‘count’, ‘revenue’: ‘sum’ }).reset_index() region_stats.columns = [‘region’, ‘customer_count’, ‘total_revenue’] print(“\nGlobal Summary:”) print(region_stats) # Visualize import matplotlib.pyplot as plt fig, axs = plt.subplots(1, 2, figsize=(14, 6)) # Customer count by region axs[0].bar(region_stats[‘region’], region_stats[‘customer_count’]) axs[0].set_title(‘Customers by Region’) axs[0].set_ylabel(‘Count’) # Revenue by region axs[1].bar(region_stats[‘region’], region_stats[‘total_revenue’] / 1000000) axs[1].set_title(‘Revenue by Region’) axs[1].set_ylabel(‘Revenue ($ Millions)’) plt.tight_layout() plt.show() # Save consolidated data to a central environment ingest_to_dataverse(global_customers, “Global_HQ”, “consolidated_customers”)

STEP 6: Visualize Your Results

Example 7: Multiple Visualizations

import pandas as pd import matplotlib.pyplot as plt import numpy as np # Query data df = query_dataverse(“Production”, “account”, select_columns=[“name”, “revenue”, “industrycode”, “createdon”], filter_query=”revenue ne null”) # Convert createdon to datetime df[‘createdon’] = pd.to_datetime(df[‘createdon’]) df[‘year’] = df[‘createdon’].dt.year # Create multiple plots fig, axs = plt.subplots(2, 2, figsize=(16, 12)) # Plot 1: Revenue distribution axs[0, 0].hist(df[‘revenue’] / 1000, bins=50, edgecolor=’black’, alpha=0.7) axs[0, 0].set_title(‘Revenue Distribution’, fontsize=14, fontweight=’bold’) axs[0, 0].set_xlabel(‘Revenue ($1000s)’) axs[0, 0].set_ylabel(‘Number of Accounts’) axs[0, 0].grid(alpha=0.3) # Plot 2: Top 10 industries industry_counts = df[‘industrycode’].value_counts().head(10) axs[0, 1].barh(range(len(industry_counts)), industry_counts.values) axs[0, 1].set_yticks(range(len(industry_counts))) axs[0, 1].set_yticklabels(industry_counts.index) axs[0, 1].set_title(‘Top 10 Industries by Account Count’, fontsize=14, fontweight=’bold’) axs[0, 1].set_xlabel(‘Number of Accounts’) # Plot 3: Revenue by year yearly_revenue = df.groupby(‘year’)[‘revenue’].sum() / 1000000 axs[1, 0].plot(yearly_revenue.index, yearly_revenue.values, marker=’o’, linewidth=2, markersize=8) axs[1, 0].set_title(‘Revenue Trend by Year’, fontsize=14, fontweight=’bold’) axs[1, 0].set_xlabel(‘Year’) axs[1, 0].set_ylabel(‘Total Revenue ($ Millions)’) axs[1, 0].grid(alpha=0.3) # Plot 4: Industry revenue pie chart top_industries = df.groupby(‘industrycode’)[‘revenue’].sum().nlargest(5) axs[1, 1].pie(top_industries.values, labels=top_industries.index, autopct=’%1.1f%%’) axs[1, 1].set_title(‘Revenue Share – Top 5 Industries’, fontsize=14, fontweight=’bold’) plt.tight_layout() plt.show() print(“✅ Visualizations created!”)

STEP 7: Save Results Back to Dataverse

Why Save Results?

Share analysis with your team
Use results in Power BI reports
Create audit trails
Build automated reporting pipelines
Historical tracking of analysis results

Example 8: Complete Analysis Pipeline with Save

import pandas as pd import matplotlib.pyplot as plt from datetime import datetime # Get data df = query_dataverse(“Production”, “account”, select_columns=[“accountid”, “name”, “revenue”, “industrycode”]) # Perform analysis summary = df.groupby(‘industrycode’).agg({ ‘accountid’: ‘count’, ‘revenue’: [‘sum’, ‘mean’, ‘min’, ‘max’] }).reset_index() summary.columns = [‘industry’, ‘account_count’, ‘total_revenue’, ‘avg_revenue’, ‘min_revenue’, ‘max_revenue’] # Add metadata summary[‘analysis_date’] = datetime.now().strftime(‘%Y-%m-%d %H:%M:%S’) summary[‘data_source’] = ‘Production’ # Sort by total revenue summary = summary.sort_values(‘total_revenue’, ascending=False) print(“Analysis Summary:”) print(summary.head(10)) # Visualize plt.figure(figsize=(12, 6)) plt.bar(summary[‘industry’].head(10), summary[‘total_revenue’].head(10) / 1000000) plt.xticks(rotation=45, ha=’right’) plt.title(‘Top 10 Industries by Revenue’) plt.ylabel(‘Revenue ($ Millions)’) plt.tight_layout() plt.show() # Save to staging table in Production result = ingest_to_dataverse(summary, “Production”, “industry_analysis_results”) print(f”\n✅ Analysis saved!”) print(f” Table: {result[‘table’]}”) print(f” Records: {result[‘success’]}”) print(f” Errors: {result[‘errors’]}”) # You can also save to a different environment result2 = ingest_to_dataverse(summary, “Development”, “industry_analysis_results”) print(f”\n✅ Also saved to Development!”)

STEP 8: Complete Real-World Examples

🎯 Complete Example 1: Customer Segmentation (RFM Analysis)

import pandas as pd import numpy as np from datetime import datetime import matplotlib.pyplot as plt # STEP 1: Get customer data from Production customers = query_dataverse(“Production”, “account”, select_columns=[“accountid”, “name”, “createdon”]) # STEP 2: Get order data orders = query_dataverse(“Production”, “salesorder”, select_columns=[“salesorderid”, “customerid”, “totalamount”, “createdon”], filter_query=”statecode eq 0″, top=50000) print(f”Customers: {len(customers)}”) print(f”Orders: {len(orders)}”) # STEP 3: Calculate RFM metrics current_date = pd.Timestamp.now() orders[‘createdon’] = pd.to_datetime(orders[‘createdon’]) rfm = orders.groupby(‘customerid’).agg({ ‘createdon’: lambda x: (current_date – x.max()).days, # Recency ‘salesorderid’: ‘count’, # Frequency ‘totalamount’: ‘sum’ # Monetary }).reset_index() rfm.columns = [‘customerid’, ‘recency_days’, ‘order_count’, ‘total_spent’] # STEP 4: Score each metric (1-5) rfm[‘recency_score’] = pd.qcut(rfm[‘recency_days’], 5, labels=[5,4,3,2,1]) rfm[‘frequency_score’] = pd.qcut(rfm[‘order_count’].rank(method=’first’), 5, labels=[1,2,3,4,5]) rfm[‘monetary_score’] = pd.qcut(rfm[‘total_spent’], 5, labels=[1,2,3,4,5]) # Total RFM score rfm[‘rfm_score’] = (rfm[‘recency_score’].astype(int) + rfm[‘frequency_score’].astype(int) + rfm[‘monetary_score’].astype(int)) # STEP 5: Segment customers def segment_customer(score): if score >= 13: return ‘Champions’ elif score >= 10: return ‘Loyal’ elif score >= 7: return ‘Potential’ else: return ‘At Risk’ rfm[‘segment’] = rfm[‘rfm_score’].apply(segment_customer) # STEP 6: Join with customer names rfm_with_names = pd.merge(rfm, customers, left_on=’customerid’, right_on=’accountid’, how=’left’) # STEP 7: Segment summary segment_summary = rfm.groupby(‘segment’).agg({ ‘customerid’: ‘count’, ‘total_spent’: ‘sum’, ‘order_count’: ‘sum’ }).reset_index() segment_summary.columns = [‘segment’, ‘customer_count’, ‘total_revenue’, ‘total_orders’] print(“\n📊 Customer Segments:”) print(segment_summary) # STEP 8: Visualize fig, axs = plt.subplots(1, 3, figsize=(18, 5)) # Customers by segment axs[0].bar(segment_summary[‘segment’], segment_summary[‘customer_count’]) axs[0].set_title(‘Customers by Segment’) axs[0].set_ylabel(‘Count’) axs[0].tick_params(axis=’x’, rotation=45) # Revenue by segment axs[1].bar(segment_summary[‘segment’], segment_summary[‘total_revenue’] / 1000000) axs[1].set_title(‘Revenue by Segment’) axs[1].set_ylabel(‘Revenue ($ Millions)’) axs[1].tick_params(axis=’x’, rotation=45) # Orders by segment axs[2].bar(segment_summary[‘segment’], segment_summary[‘total_orders’]) axs[2].set_title(‘Orders by Segment’) axs[2].set_ylabel(‘Order Count’) axs[2].tick_params(axis=’x’, rotation=45) plt.tight_layout() plt.show() # STEP 9: Save results result1 = ingest_to_dataverse(rfm_with_names, “Production”, “customer_rfm_scores”) result2 = ingest_to_dataverse(segment_summary, “Production”, “customer_segments”) print(f”\n✅ Analysis complete!”) print(f” RFM Scores saved: {result1[‘table’]}”) print(f” Segment Summary saved: {result2[‘table’]}”) print(f”\n💡 Next steps:”) print(” – Use these segments in marketing campaigns”) print(” – Build Power BI reports from staging tables”) print(” – Set up automated monthly RFM updates”)

🎯 Complete Example 2: Cross-Environment Data Migration

import pandas as pd # SCENARIO: Migrate cleaned customer data from Dev to Prod # STEP 1: Extract from Development print(“📤 STEP 1: Extracting from Development…”) dev_data = query_dataverse(“Development”, “account”, select_columns=[“name”, “revenue”, “industrycode”, “address1_city”, “emailaddress1″], filter_query=”statecode eq 0″) # Active only print(f” Extracted: {len(dev_data)} records”) # STEP 2: Data Quality Checks print(“\n🔍 STEP 2: Running quality checks…”) initial_count = len(dev_data) # Remove nulls dev_data = dev_data.dropna(subset=[‘name’]) print(f” After null removal: {len(dev_data)} records”) # Remove duplicates dev_data = dev_data.drop_duplicates(subset=[‘name’]) print(f” After deduplication: {len(dev_data)} records”) # Validate revenue dev_data = dev_data[dev_data[‘revenue’] > 0] print(f” After validation: {len(dev_data)} records”) removed = initial_count – len(dev_data) print(f” ❌ Removed {removed} invalid records”) # STEP 3: Check if records exist in Production print(“\n🔍 STEP 3: Checking for duplicates in Production…”) prod_data = query_dataverse(“Production”, “account”, select_columns=[“name”, “accountid”]) # Find new records only dev_data[‘exists_in_prod’] = dev_data[‘name’].isin(prod_data[‘name’]) new_records = dev_data[~dev_data[‘exists_in_prod’]].copy() new_records = new_records.drop(‘exists_in_prod’, axis=1) print(f” Total in Dev: {len(dev_data)}”) print(f” Already in Prod: {len(dev_data) – len(new_records)}”) print(f” New to migrate: {len(new_records)}”) # STEP 4: Transform data (if needed) print(“\n🔄 STEP 4: Transforming data…”) new_records[‘name’] = new_records[‘name’].str.upper() new_records[‘migration_date’] = pd.Timestamp.now().strftime(‘%Y-%m-%d’) new_records[‘data_source’] = ‘Development’ # STEP 5: Ingest to Production print(“\n📥 STEP 5: Ingesting to Production…”) result = ingest_to_dataverse(new_records, “Production”, “migrated_accounts”) # STEP 6: Verification print(“\n✅ MIGRATION COMPLETE!”) print(f” Target table: {result[‘table’]}”) print(f” Successfully migrated: {result[‘success’]} records”) print(f” Errors: {result[‘errors’]} records”) print(f”\n📊 Summary:”) print(f” Started with: {initial_count} records (Dev)”) print(f” After cleaning: {len(dev_data)} records”) print(f” New records: {len(new_records)}”) print(f” Migrated: {result[‘success’]}”) print(f”\n💡 Next: Review staging table in Production and move to final table”)

🎯 Complete Example 3: Multi-Environment Consolidated Report

import pandas as pd import matplotlib.pyplot as plt # SCENARIO: Create consolidated revenue report from all environments print(“📊 BUILDING MULTI-ENVIRONMENT REVENUE REPORT”) print(“=” * 60) # STEP 1: Query each environment environments = [“Production”, “UAT”, “Development”] all_data = [] for env in environments: print(f”\n📍 Querying {env}…”) try: df = query_dataverse(env, “account”, select_columns=[“name”, “revenue”, “industrycode”], filter_query=”revenue ne null”) df[‘environment’] = env all_data.append(df) print(f” ✅ {len(df)} records”) except Exception as e: print(f” ❌ Error: {e}”) # STEP 2: Consolidate consolidated = pd.concat(all_data, ignore_index=True) print(f”\n📦 Total consolidated: {len(consolidated)} records”) # STEP 3: Analysis by environment env_stats = consolidated.groupby(‘environment’).agg({ ‘revenue’: [‘sum’, ‘mean’, ‘count’] }).reset_index() env_stats.columns = [‘environment’, ‘total_revenue’, ‘avg_revenue’, ‘account_count’] print(“\n📊 Environment Summary:”) print(env_stats) # STEP 4: Industry analysis across all environments industry_stats = consolidated.groupby(‘industrycode’).agg({ ‘revenue’: ‘sum’, ‘name’: ‘count’ }).reset_index() industry_stats.columns = [‘industry’, ‘total_revenue’, ‘account_count’] industry_stats = industry_stats.sort_values(‘total_revenue’, ascending=False) print(“\n📊 Top 10 Industries (All Environments):”) print(industry_stats.head(10)) # STEP 5: Cross-environment comparison by industry industry_by_env = consolidated.pivot_table( index=’industrycode’, columns=’environment’, values=’revenue’, aggfunc=’sum’, fill_value=0 ).reset_index() print(“\n📊 Revenue by Industry and Environment:”) print(industry_by_env.head(10)) # STEP 6: Visualize fig, axs = plt.subplots(2, 2, figsize=(16, 12)) # Chart 1: Revenue by environment axs[0, 0].bar(env_stats[‘environment’], env_stats[‘total_revenue’] / 1000000) axs[0, 0].set_title(‘Total Revenue by Environment’) axs[0, 0].set_ylabel(‘Revenue ($ Millions)’) axs[0, 0].tick_params(axis=’x’, rotation=45) # Chart 2: Account count by environment axs[0, 1].bar(env_stats[‘environment’], env_stats[‘account_count’]) axs[0, 1].set_title(‘Account Count by Environment’) axs[0, 1].set_ylabel(‘Number of Accounts’) axs[0, 1].tick_params(axis=’x’, rotation=45) # Chart 3: Top 10 industries top10_ind = industry_stats.head(10) axs[1, 0].barh(range(len(top10_ind)), top10_ind[‘total_revenue’] / 1000000) axs[1, 0].set_yticks(range(len(top10_ind))) axs[1, 0].set_yticklabels(top10_ind[‘industry’]) axs[1, 0].set_xlabel(‘Revenue ($ Millions)’) axs[1, 0].set_title(‘Top 10 Industries (All Environments)’) # Chart 4: Stacked bar – Revenue by environment and industry (top 5) top5_ind = industry_stats.head(5)[‘industry’] env_ind_subset = industry_by_env[industry_by_env[‘industrycode’].isin(top5_ind)] x = range(len(environments)) width = 0.15 for i, industry in enumerate(top5_ind): row = env_ind_subset[env_ind_subset[‘industrycode’] == industry] values = [row[env].values[0] / 1000000 if env in row.columns else 0 for env in environments] axs[1, 1].bar([p + width * i for p in x], values, width, label=industry) axs[1, 1].set_xlabel(‘Environment’) axs[1, 1].set_ylabel(‘Revenue ($ Millions)’) axs[1, 1].set_title(‘Top 5 Industries by Environment’) axs[1, 1].set_xticks([p + width * 2 for p in x]) axs[1, 1].set_xticklabels(environments) axs[1, 1].legend() plt.tight_layout() plt.show() # STEP 7: Save consolidated report print(“\n💾 Saving results…”) # Save environment summary result1 = ingest_to_dataverse(env_stats, “Production”, “multi_env_summary”) # Save industry analysis result2 = ingest_to_dataverse(industry_stats, “Production”, “multi_env_industry_analysis”) # Save detailed data result3 = ingest_to_dataverse(consolidated, “Production”, “multi_env_consolidated_data”) print(f”\n✅ REPORT COMPLETE!”) print(f” Environment summary: {result1[‘table’]}”) print(f” Industry analysis: {result2[‘table’]}”) print(f” Consolidated data: {result3[‘table’]}”) print(f”\n📊 Total records saved: {result1[‘success’] + result2[‘success’] + result3[‘success’]}”)

🎓 Quick Reference

Essential Functions

Function	Purpose	Example
`list_environments()`	See connected environments	`list_environments()`
`query_dataverse()`	Get data from table	`df = query_dataverse("Prod", "account")`
`ingest_to_dataverse()`	Save results back	`ingest_to_dataverse(df, "Prod", "results")`
`pd.merge()`	Join DataFrames	`pd.merge(df1, df2, on='key')`
`pd.concat()`	Stack DataFrames	`pd.concat([df1, df2])`
`plt.show()`	Display plot	`plt.bar(x, y); plt.show()`

Common pandas Operations

Task	Code
Group by column	`df.groupby('column').sum()`
Filter rows	`df[df['revenue'] > 1000000]`
Sort	`df.sort_values('revenue', ascending=False)`
Top N	`df.nlargest(10, 'revenue')`
Count unique	`df['column'].nunique()`
Remove nulls	`df.dropna()`
Remove duplicates	`df.drop_duplicates()`

Join Types

Join Type	Description	Code
Inner Join	Only matching records	`pd.merge(df1, df2, on='key', how='inner')`
Left Join	All from left, matching from right	`pd.merge(df1, df2, on='key', how='left')`
Right Join	All from right, matching from left	`pd.merge(df1, df2, on='key', how='right')`
Outer Join	All records from both	`pd.merge(df1, df2, on='key', how='outer')`

💡 Pro Tips

Performance Tips

✅ Use select_columns to only get what you need
✅ Use filter_query to reduce data at source
✅ Use top parameter for large tables
✅ Query during off-peak hours for large datasets
✅ Save intermediate results to staging tables

Best Practices

✅ Always check data quality before joining
✅ Use .head() to preview before full processing
✅ Add metadata (date, source) to analysis results
✅ Test joins on small subsets first
✅ Document your analysis in comments
✅ Save results with descriptive table names

Troubleshooting

❌ “Environment not found” → Check name matches exactly (case-sensitive)
❌ “No records returned” → Check filter_query syntax
❌ “Join produces no results” → Check key column names match
❌ “Too many records” → Use top parameter
❌ “Memory error” → Process in smaller batches

🎉 Summary

You Now Know How To:

✅ Connect unlimited environments
✅ Query tables from any environment
✅ Filter and select specific data
✅ Analyze single tables with pandas
✅ Join tables from same environment
✅ Join tables across different environments
✅ Consolidate data from multiple orgs
✅ Create visualizations with matplotlib
✅ Save results back to Dataverse
✅ Build complete ETL pipelines

You have the power to analyze ANY data from ANY environment! 🚀