MDPN Complete Python Tab Tutorial – Query, Analyze & Join Tables

🐍 Complete Python Tab Tutorial – Query, Analyze & Join Tables

🐍 Complete Python Tab Tutorial

From Setup to Advanced Multi-Environment Joins & Analysis

📋 Table of Contents

  1. Initial Setup – Add Environments
  2. Basics – List Tables & Query Data
  3. Single Table Analysis
  4. Join Tables – Same Environment
  5. Join Tables – Cross Environment
  6. Visualize Results
  7. Save Results Back to Dataverse
  8. Complete Real-World Examples

STEP 1: Initial Setup – Add Your Environments

1.1 Add Environments

Before using Python tab, connect your environments:

1. Click ☰ menu → “🌐 Environments” 2. Click “Add New Environment” 3. Enter: – Name: “Production” (your choice) – URL: https://yourorg.crm.dynamics.com 4. Click “Connect” 5. Complete device code authentication 6. Click “Load All Tables” 7. Repeat for all environments: – Production – Development – UAT – Sandbox – (as many as you need!)

✅ You can add UNLIMITED environments!

1.2 Verify Your Environments

After adding environments, verify they’re connected:

1. Go to Python tab (☰ menu → 🐍 Python) 2. Wait 30-60 seconds for first load (downloads Python) 3. Run this code to list your environments:
# List all connected environments list_environments() # Output will show: # 📋 Available Environments: # 1. Production # 2. Development # 3. UAT # 4. Sandbox

STEP 2: Basics – Query Your First Table

2.1 Simple Query – Get All Records

import pandas as pd # Query all accounts from Production (up to 5000 records) df = query_dataverse(“Production”, “account”) # See what you got print(f”Retrieved {len(df)} records”) print(f”Columns: {len(df.columns)}”) print(“\nFirst 5 rows:”) print(df.head()) print(“\nColumn names:”) print(df.columns.tolist())

What this does:

  • Queries the “account” table from “Production” environment
  • Returns up to 5000 records as a pandas DataFrame
  • Shows you the data structure

2.2 Query Specific Columns

import pandas as pd # Only get the columns you need df = query_dataverse(“Production”, “account”, select_columns=[“accountid”, “name”, “revenue”, “industrycode”]) print(df.head()) print(f”\nShape: {df.shape}”) # (rows, columns)

Why do this: Faster queries, less data transferred

2.3 Query with Filters

import pandas as pd # Get only high-value accounts df = query_dataverse(“Production”, “account”, select_columns=[“name”, “revenue”, “industrycode”], filter_query=”revenue gt 1000000″, top=100) print(f”High-value accounts: {len(df)}”) print(df.head())

Filter Examples:

FilterOData Query
Revenue > $1Mrevenue gt 1000000
Name starts with “A”startswith(name, 'A')
Active accountsstatecode eq 0
Not null revenuerevenue ne null
Multiple conditionsrevenue gt 1000000 and statecode eq 0

STEP 3: Single Table Analysis

Example 1: Revenue Analysis by Industry

import pandas as pd import matplotlib.pyplot as plt # Get account data df = query_dataverse(“Production”, “account”, select_columns=[“name”, “revenue”, “industrycode”], filter_query=”revenue ne null and industrycode ne null”) print(f”Analyzing {len(df)} accounts”) # Group by industry industry_stats = df.groupby(‘industrycode’).agg({ ‘revenue’: [‘sum’, ‘mean’, ‘count’] }).reset_index() industry_stats.columns = [‘industry’, ‘total_revenue’, ‘avg_revenue’, ‘count’] # Sort by total revenue industry_stats = industry_stats.sort_values(‘total_revenue’, ascending=False) print(“\nTop 10 Industries:”) print(industry_stats.head(10)) # Visualize plt.figure(figsize=(12, 6)) top10 = industry_stats.head(10) plt.bar(range(len(top10)), top10[‘total_revenue’] / 1000000) # Convert to millions plt.xticks(range(len(top10)), top10[‘industry’], rotation=45, ha=’right’) plt.title(‘Top 10 Industries by Revenue’, fontsize=16, fontweight=’bold’) plt.xlabel(‘Industry’) plt.ylabel(‘Total Revenue ($ Millions)’) plt.grid(axis=’y’, alpha=0.3) plt.tight_layout() plt.show() # Save to Dataverse result = ingest_to_dataverse(industry_stats, “Production”, “industry_revenue_analysis”) print(f”\n✅ Analysis saved to {result[‘table’]}”)

Example 2: Find Top Customers

import pandas as pd # Get accounts df = query_dataverse(“Production”, “account”, select_columns=[“accountid”, “name”, “revenue”, “createdon”], filter_query=”revenue ne null”) # Sort by revenue df_sorted = df.sort_values(‘revenue’, ascending=False) # Top 20 top20 = df_sorted.head(20) print(“Top 20 Customers by Revenue:”) print(top20[[‘name’, ‘revenue’]]) # Calculate statistics print(f”\nTotal Revenue (Top 20): ${top20[‘revenue’].sum():,.2f}”) print(f”Average Revenue (Top 20): ${top20[‘revenue’].mean():,.2f}”) print(f”Median Revenue (Top 20): ${top20[‘revenue’].median():,.2f}”) # Save ingest_to_dataverse(top20, “Production”, “top20_customers”)

STEP 4: Join Tables – Same Environment

⚡ Important Concept: Joining in pandas

After querying tables from Dataverse, you use pandas merge() to join them, NOT SQL JOINs.

This gives you full pandas power for complex analysis!

Example 3: Join Accounts + Contacts

import pandas as pd # Query accounts accounts = query_dataverse(“Production”, “account”, select_columns=[“accountid”, “name”, “revenue”, “industrycode”]) # Query contacts contacts = query_dataverse(“Production”, “contact”, select_columns=[“contactid”, “fullname”, “emailaddress1”, “parentcustomerid”]) print(f”Accounts: {len(accounts)}”) print(f”Contacts: {len(contacts)}”) # Join them (pandas merge) # parentcustomerid in contacts = accountid in accounts merged = pd.merge( accounts, contacts, left_on=’accountid’, right_on=’parentcustomerid’, how=’inner’ # Only keep matches ) print(f”\nMerged: {len(merged)} rows”) print(“\nSample:”) print(merged[[‘name’, ‘fullname’, ’emailaddress1′, ‘revenue’]].head()) # Analysis: Contacts per account contacts_per_account = merged.groupby(‘name’).size().reset_index(name=’contact_count’) contacts_per_account = contacts_per_account.sort_values(‘contact_count’, ascending=False) print(“\nTop 10 Accounts by Contact Count:”) print(contacts_per_account.head(10)) # Visualize import matplotlib.pyplot as plt plt.figure(figsize=(12, 6)) top10 = contacts_per_account.head(10) plt.barh(range(len(top10)), top10[‘contact_count’]) plt.yticks(range(len(top10)), top10[‘name’]) plt.xlabel(‘Number of Contacts’) plt.title(‘Top 10 Accounts by Contact Count’) plt.tight_layout() plt.show() # Save results ingest_to_dataverse(contacts_per_account, “Production”, “contacts_per_account_analysis”)

Example 4: Three-Way Join (Accounts + Contacts + Orders)

import pandas as pd # Query all three tables accounts = query_dataverse(“Production”, “account”, select_columns=[“accountid”, “name”, “industrycode”]) contacts = query_dataverse(“Production”, “contact”, select_columns=[“contactid”, “fullname”, “parentcustomerid”]) orders = query_dataverse(“Production”, “salesorder”, select_columns=[“salesorderid”, “customerid”, “totalamount”, “createdon”], top=10000) # First join: Accounts + Contacts acc_con = pd.merge(accounts, contacts, left_on=’accountid’, right_on=’parentcustomerid’, how=’inner’) # Second join: Add Orders full = pd.merge(acc_con, orders, left_on=’accountid’, right_on=’customerid’, how=’inner’) print(f”Complete dataset: {len(full)} rows”) # Analysis: Revenue by industry industry_revenue = full.groupby(‘industrycode’)[‘totalamount’].sum().reset_index() industry_revenue.columns = [‘industry’, ‘total_order_value’] industry_revenue = industry_revenue.sort_values(‘total_order_value’, ascending=False) print(“\nRevenue by Industry (from orders):”) print(industry_revenue.head(10)) # Save ingest_to_dataverse(industry_revenue, “Production”, “industry_order_revenue”)

STEP 5: Join Tables – Cross Environment

Why Cross-Environment Joins?

  • Compare Dev vs Prod data
  • Migrate data with validation
  • Consolidate data from multiple orgs
  • Build unified reports across environments

Example 5: Compare Production vs Development

import pandas as pd # Query Production accounts prod_accounts = query_dataverse(“Production”, “account”, select_columns=[“name”, “revenue”, “accountid”]) # Query Development accounts dev_accounts = query_dataverse(“Development”, “account”, select_columns=[“name”, “revenue”, “accountid”]) print(f”Production: {len(prod_accounts)} accounts”) print(f”Development: {len(dev_accounts)} accounts”) # Compare by name comparison = pd.merge( prod_accounts, dev_accounts, on=’name’, how=’outer’, # Keep all records from both suffixes=(‘_prod’, ‘_dev’), indicator=True # Shows which side each record came from ) # Analyze differences only_prod = comparison[comparison[‘_merge’] == ‘left_only’] only_dev = comparison[comparison[‘_merge’] == ‘right_only’] in_both = comparison[comparison[‘_merge’] == ‘both’] print(f”\nOnly in Production: {len(only_prod)}”) print(f”Only in Development: {len(only_dev)}”) print(f”In both: {len(in_both)}”) # Revenue comparison for accounts in both if len(in_both) > 0: in_both[‘revenue_diff’] = in_both[‘revenue_prod’] – in_both[‘revenue_dev’] print(“\nRevenue differences (Prod – Dev):”) print(in_both[[‘name’, ‘revenue_prod’, ‘revenue_dev’, ‘revenue_diff’]].head(10)) # Save comparison report ingest_to_dataverse(comparison, “Production”, “prod_vs_dev_comparison”)

Example 6: Cross-Org Customer Consolidation

import pandas as pd # Scenario: You have separate Dataverse orgs for different regions # and want to consolidate customer data # Query North America org na_customers = query_dataverse(“North_America”, “account”, select_columns=[“name”, “revenue”, “address1_city”, “address1_country”]) # Query Europe org eu_customers = query_dataverse(“Europe”, “account”, select_columns=[“name”, “revenue”, “address1_city”, “address1_country”]) # Query Asia org asia_customers = query_dataverse(“Asia”, “account”, select_columns=[“name”, “revenue”, “address1_city”, “address1_country”]) # Add region identifier na_customers[‘region’] = ‘North America’ eu_customers[‘region’] = ‘Europe’ asia_customers[‘region’] = ‘Asia’ # Consolidate global_customers = pd.concat([na_customers, eu_customers, asia_customers], ignore_index=True) print(f”Total global customers: {len(global_customers)}”) # Analysis by region region_stats = global_customers.groupby(‘region’).agg({ ‘name’: ‘count’, ‘revenue’: ‘sum’ }).reset_index() region_stats.columns = [‘region’, ‘customer_count’, ‘total_revenue’] print(“\nGlobal Summary:”) print(region_stats) # Visualize import matplotlib.pyplot as plt fig, axs = plt.subplots(1, 2, figsize=(14, 6)) # Customer count by region axs[0].bar(region_stats[‘region’], region_stats[‘customer_count’]) axs[0].set_title(‘Customers by Region’) axs[0].set_ylabel(‘Count’) # Revenue by region axs[1].bar(region_stats[‘region’], region_stats[‘total_revenue’] / 1000000) axs[1].set_title(‘Revenue by Region’) axs[1].set_ylabel(‘Revenue ($ Millions)’) plt.tight_layout() plt.show() # Save consolidated data to a central environment ingest_to_dataverse(global_customers, “Global_HQ”, “consolidated_customers”)

STEP 6: Visualize Your Results

Example 7: Multiple Visualizations

import pandas as pd import matplotlib.pyplot as plt import numpy as np # Query data df = query_dataverse(“Production”, “account”, select_columns=[“name”, “revenue”, “industrycode”, “createdon”], filter_query=”revenue ne null”) # Convert createdon to datetime df[‘createdon’] = pd.to_datetime(df[‘createdon’]) df[‘year’] = df[‘createdon’].dt.year # Create multiple plots fig, axs = plt.subplots(2, 2, figsize=(16, 12)) # Plot 1: Revenue distribution axs[0, 0].hist(df[‘revenue’] / 1000, bins=50, edgecolor=’black’, alpha=0.7) axs[0, 0].set_title(‘Revenue Distribution’, fontsize=14, fontweight=’bold’) axs[0, 0].set_xlabel(‘Revenue ($1000s)’) axs[0, 0].set_ylabel(‘Number of Accounts’) axs[0, 0].grid(alpha=0.3) # Plot 2: Top 10 industries industry_counts = df[‘industrycode’].value_counts().head(10) axs[0, 1].barh(range(len(industry_counts)), industry_counts.values) axs[0, 1].set_yticks(range(len(industry_counts))) axs[0, 1].set_yticklabels(industry_counts.index) axs[0, 1].set_title(‘Top 10 Industries by Account Count’, fontsize=14, fontweight=’bold’) axs[0, 1].set_xlabel(‘Number of Accounts’) # Plot 3: Revenue by year yearly_revenue = df.groupby(‘year’)[‘revenue’].sum() / 1000000 axs[1, 0].plot(yearly_revenue.index, yearly_revenue.values, marker=’o’, linewidth=2, markersize=8) axs[1, 0].set_title(‘Revenue Trend by Year’, fontsize=14, fontweight=’bold’) axs[1, 0].set_xlabel(‘Year’) axs[1, 0].set_ylabel(‘Total Revenue ($ Millions)’) axs[1, 0].grid(alpha=0.3) # Plot 4: Industry revenue pie chart top_industries = df.groupby(‘industrycode’)[‘revenue’].sum().nlargest(5) axs[1, 1].pie(top_industries.values, labels=top_industries.index, autopct=’%1.1f%%’) axs[1, 1].set_title(‘Revenue Share – Top 5 Industries’, fontsize=14, fontweight=’bold’) plt.tight_layout() plt.show() print(“✅ Visualizations created!”)

STEP 7: Save Results Back to Dataverse

Why Save Results?

  • Share analysis with your team
  • Use results in Power BI reports
  • Create audit trails
  • Build automated reporting pipelines
  • Historical tracking of analysis results

Example 8: Complete Analysis Pipeline with Save

import pandas as pd import matplotlib.pyplot as plt from datetime import datetime # Get data df = query_dataverse(“Production”, “account”, select_columns=[“accountid”, “name”, “revenue”, “industrycode”]) # Perform analysis summary = df.groupby(‘industrycode’).agg({ ‘accountid’: ‘count’, ‘revenue’: [‘sum’, ‘mean’, ‘min’, ‘max’] }).reset_index() summary.columns = [‘industry’, ‘account_count’, ‘total_revenue’, ‘avg_revenue’, ‘min_revenue’, ‘max_revenue’] # Add metadata summary[‘analysis_date’] = datetime.now().strftime(‘%Y-%m-%d %H:%M:%S’) summary[‘data_source’] = ‘Production’ # Sort by total revenue summary = summary.sort_values(‘total_revenue’, ascending=False) print(“Analysis Summary:”) print(summary.head(10)) # Visualize plt.figure(figsize=(12, 6)) plt.bar(summary[‘industry’].head(10), summary[‘total_revenue’].head(10) / 1000000) plt.xticks(rotation=45, ha=’right’) plt.title(‘Top 10 Industries by Revenue’) plt.ylabel(‘Revenue ($ Millions)’) plt.tight_layout() plt.show() # Save to staging table in Production result = ingest_to_dataverse(summary, “Production”, “industry_analysis_results”) print(f”\n✅ Analysis saved!”) print(f” Table: {result[‘table’]}”) print(f” Records: {result[‘success’]}”) print(f” Errors: {result[‘errors’]}”) # You can also save to a different environment result2 = ingest_to_dataverse(summary, “Development”, “industry_analysis_results”) print(f”\n✅ Also saved to Development!”)

STEP 8: Complete Real-World Examples

🎯 Complete Example 1: Customer Segmentation (RFM Analysis)

import pandas as pd import numpy as np from datetime import datetime import matplotlib.pyplot as plt # STEP 1: Get customer data from Production customers = query_dataverse(“Production”, “account”, select_columns=[“accountid”, “name”, “createdon”]) # STEP 2: Get order data orders = query_dataverse(“Production”, “salesorder”, select_columns=[“salesorderid”, “customerid”, “totalamount”, “createdon”], filter_query=”statecode eq 0″, top=50000) print(f”Customers: {len(customers)}”) print(f”Orders: {len(orders)}”) # STEP 3: Calculate RFM metrics current_date = pd.Timestamp.now() orders[‘createdon’] = pd.to_datetime(orders[‘createdon’]) rfm = orders.groupby(‘customerid’).agg({ ‘createdon’: lambda x: (current_date – x.max()).days, # Recency ‘salesorderid’: ‘count’, # Frequency ‘totalamount’: ‘sum’ # Monetary }).reset_index() rfm.columns = [‘customerid’, ‘recency_days’, ‘order_count’, ‘total_spent’] # STEP 4: Score each metric (1-5) rfm[‘recency_score’] = pd.qcut(rfm[‘recency_days’], 5, labels=[5,4,3,2,1]) rfm[‘frequency_score’] = pd.qcut(rfm[‘order_count’].rank(method=’first’), 5, labels=[1,2,3,4,5]) rfm[‘monetary_score’] = pd.qcut(rfm[‘total_spent’], 5, labels=[1,2,3,4,5]) # Total RFM score rfm[‘rfm_score’] = (rfm[‘recency_score’].astype(int) + rfm[‘frequency_score’].astype(int) + rfm[‘monetary_score’].astype(int)) # STEP 5: Segment customers def segment_customer(score): if score >= 13: return ‘Champions’ elif score >= 10: return ‘Loyal’ elif score >= 7: return ‘Potential’ else: return ‘At Risk’ rfm[‘segment’] = rfm[‘rfm_score’].apply(segment_customer) # STEP 6: Join with customer names rfm_with_names = pd.merge(rfm, customers, left_on=’customerid’, right_on=’accountid’, how=’left’) # STEP 7: Segment summary segment_summary = rfm.groupby(‘segment’).agg({ ‘customerid’: ‘count’, ‘total_spent’: ‘sum’, ‘order_count’: ‘sum’ }).reset_index() segment_summary.columns = [‘segment’, ‘customer_count’, ‘total_revenue’, ‘total_orders’] print(“\n📊 Customer Segments:”) print(segment_summary) # STEP 8: Visualize fig, axs = plt.subplots(1, 3, figsize=(18, 5)) # Customers by segment axs[0].bar(segment_summary[‘segment’], segment_summary[‘customer_count’]) axs[0].set_title(‘Customers by Segment’) axs[0].set_ylabel(‘Count’) axs[0].tick_params(axis=’x’, rotation=45) # Revenue by segment axs[1].bar(segment_summary[‘segment’], segment_summary[‘total_revenue’] / 1000000) axs[1].set_title(‘Revenue by Segment’) axs[1].set_ylabel(‘Revenue ($ Millions)’) axs[1].tick_params(axis=’x’, rotation=45) # Orders by segment axs[2].bar(segment_summary[‘segment’], segment_summary[‘total_orders’]) axs[2].set_title(‘Orders by Segment’) axs[2].set_ylabel(‘Order Count’) axs[2].tick_params(axis=’x’, rotation=45) plt.tight_layout() plt.show() # STEP 9: Save results result1 = ingest_to_dataverse(rfm_with_names, “Production”, “customer_rfm_scores”) result2 = ingest_to_dataverse(segment_summary, “Production”, “customer_segments”) print(f”\n✅ Analysis complete!”) print(f” RFM Scores saved: {result1[‘table’]}”) print(f” Segment Summary saved: {result2[‘table’]}”) print(f”\n💡 Next steps:”) print(” – Use these segments in marketing campaigns”) print(” – Build Power BI reports from staging tables”) print(” – Set up automated monthly RFM updates”)

🎯 Complete Example 2: Cross-Environment Data Migration

import pandas as pd # SCENARIO: Migrate cleaned customer data from Dev to Prod # STEP 1: Extract from Development print(“📤 STEP 1: Extracting from Development…”) dev_data = query_dataverse(“Development”, “account”, select_columns=[“name”, “revenue”, “industrycode”, “address1_city”, “emailaddress1″], filter_query=”statecode eq 0″) # Active only print(f” Extracted: {len(dev_data)} records”) # STEP 2: Data Quality Checks print(“\n🔍 STEP 2: Running quality checks…”) initial_count = len(dev_data) # Remove nulls dev_data = dev_data.dropna(subset=[‘name’]) print(f” After null removal: {len(dev_data)} records”) # Remove duplicates dev_data = dev_data.drop_duplicates(subset=[‘name’]) print(f” After deduplication: {len(dev_data)} records”) # Validate revenue dev_data = dev_data[dev_data[‘revenue’] > 0] print(f” After validation: {len(dev_data)} records”) removed = initial_count – len(dev_data) print(f” ❌ Removed {removed} invalid records”) # STEP 3: Check if records exist in Production print(“\n🔍 STEP 3: Checking for duplicates in Production…”) prod_data = query_dataverse(“Production”, “account”, select_columns=[“name”, “accountid”]) # Find new records only dev_data[‘exists_in_prod’] = dev_data[‘name’].isin(prod_data[‘name’]) new_records = dev_data[~dev_data[‘exists_in_prod’]].copy() new_records = new_records.drop(‘exists_in_prod’, axis=1) print(f” Total in Dev: {len(dev_data)}”) print(f” Already in Prod: {len(dev_data) – len(new_records)}”) print(f” New to migrate: {len(new_records)}”) # STEP 4: Transform data (if needed) print(“\n🔄 STEP 4: Transforming data…”) new_records[‘name’] = new_records[‘name’].str.upper() new_records[‘migration_date’] = pd.Timestamp.now().strftime(‘%Y-%m-%d’) new_records[‘data_source’] = ‘Development’ # STEP 5: Ingest to Production print(“\n📥 STEP 5: Ingesting to Production…”) result = ingest_to_dataverse(new_records, “Production”, “migrated_accounts”) # STEP 6: Verification print(“\n✅ MIGRATION COMPLETE!”) print(f” Target table: {result[‘table’]}”) print(f” Successfully migrated: {result[‘success’]} records”) print(f” Errors: {result[‘errors’]} records”) print(f”\n📊 Summary:”) print(f” Started with: {initial_count} records (Dev)”) print(f” After cleaning: {len(dev_data)} records”) print(f” New records: {len(new_records)}”) print(f” Migrated: {result[‘success’]}”) print(f”\n💡 Next: Review staging table in Production and move to final table”)

🎯 Complete Example 3: Multi-Environment Consolidated Report

import pandas as pd import matplotlib.pyplot as plt # SCENARIO: Create consolidated revenue report from all environments print(“📊 BUILDING MULTI-ENVIRONMENT REVENUE REPORT”) print(“=” * 60) # STEP 1: Query each environment environments = [“Production”, “UAT”, “Development”] all_data = [] for env in environments: print(f”\n📍 Querying {env}…”) try: df = query_dataverse(env, “account”, select_columns=[“name”, “revenue”, “industrycode”], filter_query=”revenue ne null”) df[‘environment’] = env all_data.append(df) print(f” ✅ {len(df)} records”) except Exception as e: print(f” ❌ Error: {e}”) # STEP 2: Consolidate consolidated = pd.concat(all_data, ignore_index=True) print(f”\n📦 Total consolidated: {len(consolidated)} records”) # STEP 3: Analysis by environment env_stats = consolidated.groupby(‘environment’).agg({ ‘revenue’: [‘sum’, ‘mean’, ‘count’] }).reset_index() env_stats.columns = [‘environment’, ‘total_revenue’, ‘avg_revenue’, ‘account_count’] print(“\n📊 Environment Summary:”) print(env_stats) # STEP 4: Industry analysis across all environments industry_stats = consolidated.groupby(‘industrycode’).agg({ ‘revenue’: ‘sum’, ‘name’: ‘count’ }).reset_index() industry_stats.columns = [‘industry’, ‘total_revenue’, ‘account_count’] industry_stats = industry_stats.sort_values(‘total_revenue’, ascending=False) print(“\n📊 Top 10 Industries (All Environments):”) print(industry_stats.head(10)) # STEP 5: Cross-environment comparison by industry industry_by_env = consolidated.pivot_table( index=’industrycode’, columns=’environment’, values=’revenue’, aggfunc=’sum’, fill_value=0 ).reset_index() print(“\n📊 Revenue by Industry and Environment:”) print(industry_by_env.head(10)) # STEP 6: Visualize fig, axs = plt.subplots(2, 2, figsize=(16, 12)) # Chart 1: Revenue by environment axs[0, 0].bar(env_stats[‘environment’], env_stats[‘total_revenue’] / 1000000) axs[0, 0].set_title(‘Total Revenue by Environment’) axs[0, 0].set_ylabel(‘Revenue ($ Millions)’) axs[0, 0].tick_params(axis=’x’, rotation=45) # Chart 2: Account count by environment axs[0, 1].bar(env_stats[‘environment’], env_stats[‘account_count’]) axs[0, 1].set_title(‘Account Count by Environment’) axs[0, 1].set_ylabel(‘Number of Accounts’) axs[0, 1].tick_params(axis=’x’, rotation=45) # Chart 3: Top 10 industries top10_ind = industry_stats.head(10) axs[1, 0].barh(range(len(top10_ind)), top10_ind[‘total_revenue’] / 1000000) axs[1, 0].set_yticks(range(len(top10_ind))) axs[1, 0].set_yticklabels(top10_ind[‘industry’]) axs[1, 0].set_xlabel(‘Revenue ($ Millions)’) axs[1, 0].set_title(‘Top 10 Industries (All Environments)’) # Chart 4: Stacked bar – Revenue by environment and industry (top 5) top5_ind = industry_stats.head(5)[‘industry’] env_ind_subset = industry_by_env[industry_by_env[‘industrycode’].isin(top5_ind)] x = range(len(environments)) width = 0.15 for i, industry in enumerate(top5_ind): row = env_ind_subset[env_ind_subset[‘industrycode’] == industry] values = [row[env].values[0] / 1000000 if env in row.columns else 0 for env in environments] axs[1, 1].bar([p + width * i for p in x], values, width, label=industry) axs[1, 1].set_xlabel(‘Environment’) axs[1, 1].set_ylabel(‘Revenue ($ Millions)’) axs[1, 1].set_title(‘Top 5 Industries by Environment’) axs[1, 1].set_xticks([p + width * 2 for p in x]) axs[1, 1].set_xticklabels(environments) axs[1, 1].legend() plt.tight_layout() plt.show() # STEP 7: Save consolidated report print(“\n💾 Saving results…”) # Save environment summary result1 = ingest_to_dataverse(env_stats, “Production”, “multi_env_summary”) # Save industry analysis result2 = ingest_to_dataverse(industry_stats, “Production”, “multi_env_industry_analysis”) # Save detailed data result3 = ingest_to_dataverse(consolidated, “Production”, “multi_env_consolidated_data”) print(f”\n✅ REPORT COMPLETE!”) print(f” Environment summary: {result1[‘table’]}”) print(f” Industry analysis: {result2[‘table’]}”) print(f” Consolidated data: {result3[‘table’]}”) print(f”\n📊 Total records saved: {result1[‘success’] + result2[‘success’] + result3[‘success’]}”)

🎓 Quick Reference

Essential Functions

FunctionPurposeExample
list_environments() See connected environments list_environments()
query_dataverse() Get data from table df = query_dataverse("Prod", "account")
ingest_to_dataverse() Save results back ingest_to_dataverse(df, "Prod", "results")
pd.merge() Join DataFrames pd.merge(df1, df2, on='key')
pd.concat() Stack DataFrames pd.concat([df1, df2])
plt.show() Display plot plt.bar(x, y); plt.show()

Common pandas Operations

TaskCode
Group by column df.groupby('column').sum()
Filter rows df[df['revenue'] > 1000000]
Sort df.sort_values('revenue', ascending=False)
Top N df.nlargest(10, 'revenue')
Count unique df['column'].nunique()
Remove nulls df.dropna()
Remove duplicates df.drop_duplicates()

Join Types

Join TypeDescriptionCode
Inner Join Only matching records pd.merge(df1, df2, on='key', how='inner')
Left Join All from left, matching from right pd.merge(df1, df2, on='key', how='left')
Right Join All from right, matching from left pd.merge(df1, df2, on='key', how='right')
Outer Join All records from both pd.merge(df1, df2, on='key', how='outer')

💡 Pro Tips

Performance Tips

  • ✅ Use select_columns to only get what you need
  • ✅ Use filter_query to reduce data at source
  • ✅ Use top parameter for large tables
  • ✅ Query during off-peak hours for large datasets
  • ✅ Save intermediate results to staging tables

Best Practices

  • ✅ Always check data quality before joining
  • ✅ Use .head() to preview before full processing
  • ✅ Add metadata (date, source) to analysis results
  • ✅ Test joins on small subsets first
  • ✅ Document your analysis in comments
  • ✅ Save results with descriptive table names

Troubleshooting

  • ❌ “Environment not found” → Check name matches exactly (case-sensitive)
  • ❌ “No records returned” → Check filter_query syntax
  • ❌ “Join produces no results” → Check key column names match
  • ❌ “Too many records” → Use top parameter
  • ❌ “Memory error” → Process in smaller batches

🎉 Summary

You Now Know How To:

  • ✅ Connect unlimited environments
  • ✅ Query tables from any environment
  • ✅ Filter and select specific data
  • ✅ Analyze single tables with pandas
  • ✅ Join tables from same environment
  • ✅ Join tables across different environments
  • ✅ Consolidate data from multiple orgs
  • ✅ Create visualizations with matplotlib
  • ✅ Save results back to Dataverse
  • ✅ Build complete ETL pipelines

You have the power to analyze ANY data from ANY environment! 🚀

Shopping Cart