🐍 Python Tab – Complete Tutorial
Query, analyze, join tables across unlimited environments with pandas & matplotlib
STEP 1: Setup – Add Environments
1.1 Connect Your Environments
1. Go to ☰ menu → “🌐 Environments”
2. Click “Add New Environment”
3. Enter:
– Name: “Production”
– URL: https://yourorg.crm.dynamics.com
4. Click “Connect” → Complete device code auth
5. Click “Load All Tables”
6. Repeat for unlimited environments!
1.2 Verify in Python Tab
# List all connected environments
list_environments()
# Output shows:
# 📋 Available Environments:
# 1. Production
# 2. Development
# 3. UAT
# 4. Sandbox
STEP 2: Query Tables
Basic Query
import pandas as pd
# Query all accounts (up to 5000 records)
df = query_dataverse(“Production”, “account”)
print(f”Retrieved {len(df)} records with {len(df.columns)} columns”)
print(df.head())
Query with Filters
# Get high-value accounts only
df = query_dataverse(“Production”, “account”,
select_columns=[“name”, “revenue”, “industrycode”],
filter_query=”revenue gt 1000000″,
top=100)
print(f”High-value accounts: {len(df)}”)
| Filter Type | OData Syntax |
| Greater than | revenue gt 1000000 |
| Equals | statecode eq 0 |
| Starts with | startswith(name, 'A') |
| Not null | revenue ne null |
| Multiple | revenue gt 1000000 and statecode eq 0 |
STEP 3: Join Tables – Same Environment
Example: Accounts + Contacts
import pandas as pd
# Query both tables
accounts = query_dataverse(“Production”, “account”,
select_columns=[“accountid”, “name”, “revenue”])
contacts = query_dataverse(“Production”, “contact”,
select_columns=[“contactid”, “fullname”, “parentcustomerid”])
# Join them
merged = pd.merge(accounts, contacts,
left_on=’accountid’,
right_on=’parentcustomerid’,
how=’inner’)
print(f”Merged: {len(merged)} rows”)
# Analysis: Contacts per account
result = merged.groupby(‘name’).size().reset_index(name=’contact_count’)
result = result.sort_values(‘contact_count’, ascending=False)
print(“Top 10 accounts by contacts:”)
print(result.head(10))
# Save
ingest_to_dataverse(result, “Production”, “contacts_per_account”)
STEP 4: Join Tables – Cross Environment
Example: Compare Prod vs Dev
import pandas as pd
# Query from both environments
prod = query_dataverse(“Production”, “account”,
select_columns=[“name”, “revenue”])
dev = query_dataverse(“Development”, “account”,
select_columns=[“name”, “revenue”])
# Compare
comparison = pd.merge(prod, dev,
on=’name’,
how=’outer’,
suffixes=(‘_prod’, ‘_dev’),
indicator=True)
# Find differences
only_prod = comparison[comparison[‘_merge’] == ‘left_only’]
only_dev = comparison[comparison[‘_merge’] == ‘right_only’]
in_both = comparison[comparison[‘_merge’] == ‘both’]
print(f”Only in Prod: {len(only_prod)}”)
print(f”Only in Dev: {len(only_dev)}”)
print(f”In both: {len(in_both)}”)
# Save comparison
ingest_to_dataverse(comparison, “Production”, “prod_vs_dev”)
STEP 5: Visualizations
Create Charts
import matplotlib.pyplot as plt
# Get data
df = query_dataverse(“Production”, “account”,
select_columns=[“industrycode”, “revenue”],
filter_query=”revenue ne null”)
# Aggregate
industry_revenue = df.groupby(‘industrycode’)[‘revenue’].sum() / 1000000
industry_revenue = industry_revenue.sort_values(ascending=False).head(10)
# Plot
plt.figure(figsize=(12, 6))
plt.bar(range(len(industry_revenue)), industry_revenue.values)
plt.xticks(range(len(industry_revenue)), industry_revenue.index, rotation=45, ha=’right’)
plt.title(‘Top 10 Industries by Revenue’, fontsize=16, fontweight=’bold’)
plt.ylabel(‘Revenue ($ Millions)’)
plt.grid(axis=’y’, alpha=0.3)
plt.tight_layout()
plt.show() # Auto-displays in browser!
STEP 6: Save Results
Ingest Analysis Results
import pandas as pd
from datetime import datetime
# Perform analysis
df = query_dataverse(“Production”, “account”)
summary = df.groupby(‘industrycode’).agg({
‘revenue’: [‘sum’, ‘mean’, ‘count’]
}).reset_index()
# Add metadata
summary[‘analysis_date’] = datetime.now().strftime(‘%Y-%m-%d’)
# Save to staging table
result = ingest_to_dataverse(summary, “Production”, “industry_analysis”)
print(f”✅ Saved {result[‘success’]} records to {result[‘table’]}”)
🏭 INDUSTRY EXAMPLES
HEALTHCARE
Patient Readmission Analysis
import pandas as pd
import matplotlib.pyplot as plt
# Query patient admissions
admissions = query_dataverse(“Healthcare_Prod”, “msemr_patient”,
select_columns=[“msemr_patientid”, “msemr_name”,
“msemr_birthdate”, “msemr_chronicconditions”])
visits = query_dataverse(“Healthcare_Prod”, “msemr_encounter”,
select_columns=[“msemr_encounterid”, “msemr_patientvalue”,
“msemr_admitdate”, “msemr_dischargedate”])
# Join patients and visits
patient_visits = pd.merge(admissions, visits,
left_on=’msemr_patientid’,
right_on=’msemr_patientvalue’)
# Convert dates
patient_visits[‘msemr_admitdate’] = pd.to_datetime(patient_visits[‘msemr_admitdate’])
patient_visits[‘msemr_dischargedate’] = pd.to_datetime(patient_visits[‘msemr_dischargedate’])
# Calculate length of stay
patient_visits[‘los_days’] = (patient_visits[‘msemr_dischargedate’] –
patient_visits[‘msemr_admitdate’]).dt.days
# Identify readmissions (within 30 days)
patient_visits = patient_visits.sort_values([‘msemr_patientid’, ‘msemr_admitdate’])
patient_visits[‘next_admit’] = patient_visits.groupby(‘msemr_patientid’)[‘msemr_admitdate’].shift(-1)
patient_visits[‘days_to_readmit’] = (patient_visits[‘next_admit’] –
patient_visits[‘msemr_dischargedate’]).dt.days
patient_visits[‘is_readmission’] = patient_visits[‘days_to_readmit’] <= 30
# Analysis
readmit_rate = patient_visits['is_readmission'].sum() / len(patient_visits) * 100
print(f"30-day Readmission Rate: {readmit_rate:.2f}%")
# Readmissions by chronic conditions
chronic_readmit = patient_visits.groupby('msemr_chronicconditions')['is_readmission'].mean() * 100
chronic_readmit = chronic_readmit.sort_values(ascending=False)
# Visualize
plt.figure(figsize=(12, 6))
plt.barh(range(len(chronic_readmit)), chronic_readmit.values)
plt.yticks(range(len(chronic_readmit)), chronic_readmit.index)
plt.xlabel('Readmission Rate (%)')
plt.title('Readmission Rate by Chronic Condition')
plt.tight_layout()
plt.show()
# Save insights
result = ingest_to_dataverse(patient_visits, "Healthcare_Prod", "readmission_analysis")
print(f"✅ Analysis saved: {result['success']} records")
RETAIL
Customer Purchase Patterns & RFM Segmentation
import pandas as pd
import matplotlib.pyplot as plt
# Query customers and orders
customers = query_dataverse(“Retail_Prod”, “account”,
select_columns=[“accountid”, “name”, “emailaddress1”])
orders = query_dataverse(“Retail_Prod”, “salesorder”,
select_columns=[“salesorderid”, “customerid”, “totalamount”,
“ordernumber”, “createdon”],
top=50000)
# RFM Analysis
current_date = pd.Timestamp.now()
orders[‘createdon’] = pd.to_datetime(orders[‘createdon’])
rfm = orders.groupby(‘customerid’).agg({
‘createdon’: lambda x: (current_date – x.max()).days, # Recency
‘salesorderid’: ‘count’, # Frequency
‘totalamount’: ‘sum’ # Monetary
}).reset_index()
rfm.columns = [‘customerid’, ‘recency_days’, ‘purchase_count’, ‘total_spent’]
# Score RFM (1-5 scale)
rfm[‘R_score’] = pd.qcut(rfm[‘recency_days’], 5, labels=[5,4,3,2,1])
rfm[‘F_score’] = pd.qcut(rfm[‘purchase_count’].rank(method=’first’), 5, labels=[1,2,3,4,5])
rfm[‘M_score’] = pd.qcut(rfm[‘total_spent’], 5, labels=[1,2,3,4,5])
rfm[‘RFM_score’] = (rfm[‘R_score’].astype(int) +
rfm[‘F_score’].astype(int) +
rfm[‘M_score’].astype(int))
# Segment customers
def segment(score):
if score >= 13: return ‘Champions’
elif score >= 10: return ‘Loyal’
elif score >= 7: return ‘Potential’
else: return ‘At Risk’
rfm[‘segment’] = rfm[‘RFM_score’].apply(segment)
# Join with customer names
rfm_customers = pd.merge(rfm, customers, left_on=’customerid’, right_on=’accountid’)
# Segment summary
segments = rfm.groupby(‘segment’).agg({
‘customerid’: ‘count’,
‘total_spent’: ‘sum’
}).reset_index()
segments.columns = [‘segment’, ‘customer_count’, ‘total_revenue’]
print(“📊 Customer Segments:”)
print(segments)
# Visualize
fig, axs = plt.subplots(1, 2, figsize=(14, 6))
axs[0].bar(segments[‘segment’], segments[‘customer_count’])
axs[0].set_title(‘Customers by Segment’)
axs[0].set_ylabel(‘Count’)
axs[0].tick_params(axis=’x’, rotation=45)
axs[1].bar(segments[‘segment’], segments[‘total_revenue’] / 1000)
axs[1].set_title(‘Revenue by Segment’)
axs[1].set_ylabel(‘Revenue ($1000s)’)
axs[1].tick_params(axis=’x’, rotation=45)
plt.tight_layout()
plt.show()
# Save
ingest_to_dataverse(rfm_customers, “Retail_Prod”, “customer_rfm_scores”)
ingest_to_dataverse(segments, “Retail_Prod”, “customer_segments”)
print(“✅ RFM Analysis Complete!”)
print(“💡 Use these segments for:”)
print(” – Targeted email campaigns”)
print(” – Loyalty program tiers”)
print(” – Win-back campaigns for ‘At Risk'”)
FINANCE
Loan Default Risk Analysis
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
# Query loan applications
loans = query_dataverse(“Finance_Prod”, “loan”,
select_columns=[“loanid”, “customerid”, “amount”, “interestrate”,
“term_months”, “status”, “credit_score”, “income”])
customers = query_dataverse(“Finance_Prod”, “contact”,
select_columns=[“contactid”, “fullname”, “age”,
“employment_status”, “debt_to_income”])
# Join
loan_data = pd.merge(loans, customers, left_on=’customerid’, right_on=’contactid’)
# Calculate risk metrics
loan_data[‘loan_to_income’] = loan_data[‘amount’] / loan_data[‘income’]
loan_data[‘monthly_payment’] = (loan_data[‘amount’] * (loan_data[‘interestrate’]/100/12) *
(1 + loan_data[‘interestrate’]/100/12)**loan_data[‘term_months’]) / \
((1 + loan_data[‘interestrate’]/100/12)**loan_data[‘term_months’] – 1)
# Define default (status = ‘Defaulted’ or ‘Delinquent’)
loan_data[‘is_default’] = loan_data[‘status’].isin([‘Defaulted’, ‘Delinquent’])
# Risk factors analysis
print(“📊 Default Rate by Credit Score Range:”)
loan_data[‘credit_range’] = pd.cut(loan_data[‘credit_score’],
bins=[0, 600, 650, 700, 750, 850],
labels=[‘<600', '600-650', '650-700', '700-750', '750+'])
default_by_credit = loan_data.groupby('credit_range')['is_default'].agg(['sum', 'count'])
default_by_credit['default_rate'] = default_by_credit['sum'] / default_by_credit['count'] * 100
print(default_by_credit)
print("\n📊 Default Rate by Loan-to-Income Ratio:")
loan_data['lti_range'] = pd.cut(loan_data['loan_to_income'],
bins=[0, 2, 3, 4, 100],
labels=['<2x', '2-3x', '3-4x', '>4x’])
default_by_lti = loan_data.groupby(‘lti_range’)[‘is_default’].agg([‘sum’, ‘count’])
default_by_lti[‘default_rate’] = default_by_lti[‘sum’] / default_by_lti[‘count’] * 100
print(default_by_lti)
# Predict risk score (simple scoring model)
def calculate_risk_score(row):
score = 0
if row[‘credit_score’] < 600: score += 40
elif row['credit_score'] < 650: score += 30
elif row['credit_score'] < 700: score += 20
if row['loan_to_income'] > 4: score += 30
elif row[‘loan_to_income’] > 3: score += 20
elif row[‘loan_to_income’] > 2: score += 10
if row[‘debt_to_income’] > 0.5: score += 20
elif row[‘debt_to_income’] > 0.4: score += 10
return score
loan_data[‘risk_score’] = loan_data.apply(calculate_risk_score, axis=1)
def risk_category(score):
if score >= 60: return ‘High Risk’
elif score >= 30: return ‘Medium Risk’
else: return ‘Low Risk’
loan_data[‘risk_category’] = loan_data[‘risk_score’].apply(risk_category)
# Visualize
fig, axs = plt.subplots(2, 2, figsize=(14, 10))
# Chart 1: Default rate by credit score
axs[0,0].bar(range(len(default_by_credit)), default_by_credit[‘default_rate’])
axs[0,0].set_xticks(range(len(default_by_credit)))
axs[0,0].set_xticklabels(default_by_credit.index)
axs[0,0].set_title(‘Default Rate by Credit Score’)
axs[0,0].set_ylabel(‘Default Rate (%)’)
# Chart 2: Risk distribution
risk_dist = loan_data[‘risk_category’].value_counts()
axs[0,1].pie(risk_dist.values, labels=risk_dist.index, autopct=’%1.1f%%’)
axs[0,1].set_title(‘Loan Portfolio by Risk Category’)
# Chart 3: Loan amount by risk
risk_amounts = loan_data.groupby(‘risk_category’)[‘amount’].sum() / 1000000
axs[1,0].bar(risk_amounts.index, risk_amounts.values)
axs[1,0].set_title(‘Total Loan Amount by Risk’)
axs[1,0].set_ylabel(‘Amount ($ Millions)’)
# Chart 4: Risk score distribution
axs[1,1].hist(loan_data[‘risk_score’], bins=20, edgecolor=’black’)
axs[1,1].set_title(‘Risk Score Distribution’)
axs[1,1].set_xlabel(‘Risk Score’)
axs[1,1].set_ylabel(‘Number of Loans’)
plt.tight_layout()
plt.show()
# Save risk analysis
result = ingest_to_dataverse(loan_data, “Finance_Prod”, “loan_risk_analysis”)
print(f”\n✅ Risk analysis saved: {result[‘success’]} records”)
print(“\n💡 High Risk Loans require:”)
print(” – Additional collateral”)
print(” – Co-signer requirement”)
print(” – Higher interest rates”)
print(” – Enhanced monitoring”)
MANUFACTURING
Production Quality & Defect Analysis
import pandas as pd
import matplotlib.pyplot as plt
# Query production data
production = query_dataverse(“Manufacturing_Prod”, “production_run”,
select_columns=[“runid”, “product_id”, “line_id”,
“quantity_produced”, “quantity_defective”,
“run_date”, “shift”, “operator_id”])
products = query_dataverse(“Manufacturing_Prod”, “product”,
select_columns=[“productid”, “name”, “category”, “target_yield”])
# Join
prod_data = pd.merge(production, products, left_on=’product_id’, right_on=’productid’)
# Calculate quality metrics
prod_data[‘defect_rate’] = (prod_data[‘quantity_defective’] /
prod_data[‘quantity_produced’] * 100)
prod_data[‘yield_rate’] = ((prod_data[‘quantity_produced’] – prod_data[‘quantity_defective’]) /
prod_data[‘quantity_produced’] * 100)
prod_data[‘meets_target’] = prod_data[‘yield_rate’] >= prod_data[‘target_yield’]
# Analysis by product
product_quality = prod_data.groupby(‘name’).agg({
‘quantity_produced’: ‘sum’,
‘quantity_defective’: ‘sum’,
‘defect_rate’: ‘mean’,
‘yield_rate’: ‘mean’
}).reset_index()
product_quality = product_quality.sort_values(‘defect_rate’, ascending=False)
print(“📊 Top 10 Products by Defect Rate:”)
print(product_quality.head(10))
# Analysis by production line
line_quality = prod_data.groupby(‘line_id’).agg({
‘quantity_produced’: ‘sum’,
‘quantity_defective’: ‘sum’,
‘defect_rate’: ‘mean’
}).reset_index()
line_quality = line_quality.sort_values(‘defect_rate’, ascending=False)
print(“\n📊 Production Line Quality:”)
print(line_quality)
# Analysis by shift
shift_quality = prod_data.groupby(‘shift’).agg({
‘defect_rate’: ‘mean’,
‘yield_rate’: ‘mean’,
‘quantity_produced’: ‘sum’
}).reset_index()
print(“\n📊 Quality by Shift:”)
print(shift_quality)
# Trend analysis
prod_data[‘run_date’] = pd.to_datetime(prod_data[‘run_date’])
prod_data[‘week’] = prod_data[‘run_date’].dt.isocalendar().week
weekly_quality = prod_data.groupby(‘week’).agg({
‘defect_rate’: ‘mean’,
‘yield_rate’: ‘mean’
}).reset_index()
# Visualize
fig, axs = plt.subplots(2, 2, figsize=(16, 10))
# Chart 1: Top 10 defect rates by product
top10 = product_quality.head(10)
axs[0,0].barh(range(len(top10)), top10[‘defect_rate’])
axs[0,0].set_yticks(range(len(top10)))
axs[0,0].set_yticklabels(top10[‘name’])
axs[0,0].set_xlabel(‘Defect Rate (%)’)
axs[0,0].set_title(‘Top 10 Products by Defect Rate’)
# Chart 2: Line comparison
axs[0,1].bar(line_quality[‘line_id’], line_quality[‘defect_rate’])
axs[0,1].set_xlabel(‘Production Line’)
axs[0,1].set_ylabel(‘Defect Rate (%)’)
axs[0,1].set_title(‘Defect Rate by Production Line’)
axs[0,1].axhline(y=prod_data[‘defect_rate’].mean(), color=’r’, linestyle=’–‘,
label=’Average’)
axs[0,1].legend()
# Chart 3: Shift comparison
axs[1,0].bar(shift_quality[‘shift’], shift_quality[‘defect_rate’])
axs[1,0].set_xlabel(‘Shift’)
axs[1,0].set_ylabel(‘Defect Rate (%)’)
axs[1,0].set_title(‘Quality by Shift’)
# Chart 4: Weekly trend
axs[1,1].plot(weekly_quality[‘week’], weekly_quality[‘defect_rate’],
marker=’o’, linewidth=2, label=’Defect Rate’)
axs[1,1].plot(weekly_quality[‘week’], 100 – weekly_quality[‘yield_rate’],
marker=’s’, linewidth=2, label=’Yield Loss’)
axs[1,1].set_xlabel(‘Week Number’)
axs[1,1].set_ylabel(‘Rate (%)’)
axs[1,1].set_title(‘Quality Trend Over Time’)
axs[1,1].legend()
axs[1,1].grid(alpha=0.3)
plt.tight_layout()
plt.show()
# Identify problem areas
high_defect_products = product_quality[product_quality[‘defect_rate’] > 5]
high_defect_lines = line_quality[line_quality[‘defect_rate’] > 3]
print(f”\n🚨 QUALITY ALERTS:”)
print(f” {len(high_defect_products)} products exceed 5% defect rate”)
print(f” {len(high_defect_lines)} production lines exceed 3% defect rate”)
# Save analysis
result1 = ingest_to_dataverse(product_quality, “Manufacturing_Prod”, “product_quality_analysis”)
result2 = ingest_to_dataverse(line_quality, “Manufacturing_Prod”, “line_quality_analysis”)
print(f”\n✅ Quality analysis saved!”)
print(“\n💡 Recommended Actions:”)
print(” – Inspect high-defect production lines”)
print(” – Additional training for underperforming shifts”)
print(” – Quality control process review for problem products”)
EDUCATION
Student Performance & Intervention Analysis
import pandas as pd
import matplotlib.pyplot as plt
# Query student data
students = query_dataverse(“Education_Prod”, “student”,
select_columns=[“studentid”, “name”, “grade_level”,
“attendance_rate”, “free_lunch_eligible”])
grades = query_dataverse(“Education_Prod”, “grade”,
select_columns=[“gradeid”, “studentid”, “course”, “term”,
“grade_percentage”, “credits”])
assessments = query_dataverse(“Education_Prod”, “assessment”,
select_columns=[“assessmentid”, “studentid”, “test_name”,
“score”, “assessment_date”])
# Join student data with grades
student_grades = pd.merge(students, grades, on=’studentid’)
# Calculate GPA
def letter_to_gpa(percentage):
if percentage >= 90: return 4.0
elif percentage >= 80: return 3.0
elif percentage >= 70: return 2.0
elif percentage >= 60: return 1.0
else: return 0.0
student_grades[‘gpa_points’] = student_grades[‘grade_percentage’].apply(letter_to_gpa)
# GPA by student
gpa_summary = student_grades.groupby([‘studentid’, ‘name’, ‘grade_level’,
‘attendance_rate’, ‘free_lunch_eligible’]).agg({
‘gpa_points’: ‘mean’,
‘grade_percentage’: ‘mean’,
‘credits’: ‘sum’
}).reset_index()
gpa_summary.columns = [‘studentid’, ‘name’, ‘grade_level’, ‘attendance_rate’,
‘free_lunch_eligible’, ‘gpa’, ‘avg_grade’, ‘total_credits’]
# Identify at-risk students
def risk_category(row):
risk_factors = 0
if row[‘gpa’] < 2.0: risk_factors += 3
elif row['gpa'] < 2.5: risk_factors += 2
if row['attendance_rate'] < 85: risk_factors += 2
elif row['attendance_rate'] < 90: risk_factors += 1
if risk_factors >= 4: return ‘High Risk’
elif risk_factors >= 2: return ‘Medium Risk’
else: return ‘On Track’
gpa_summary[‘risk_category’] = gpa_summary.apply(risk_category, axis=1)
# Analysis
print(“📊 Student Risk Distribution:”)
print(gpa_summary[‘risk_category’].value_counts())
print(“\n📊 Average GPA by Risk Category:”)
print(gpa_summary.groupby(‘risk_category’)[‘gpa’].mean())
print(“\n📊 Correlation: Attendance vs GPA”)
corr = gpa_summary[[‘attendance_rate’, ‘gpa’]].corr()
print(corr)
# Socioeconomic analysis
print(“\n📊 Performance by Economic Status:”)
econ_analysis = gpa_summary.groupby(‘free_lunch_eligible’).agg({
‘gpa’: ‘mean’,
‘attendance_rate’: ‘mean’,
‘studentid’: ‘count’
}).reset_index()
econ_analysis.columns = [‘free_lunch’, ‘avg_gpa’, ‘avg_attendance’, ‘student_count’]
print(econ_analysis)
# Visualize
fig, axs = plt.subplots(2, 2, figsize=(14, 10))
# Chart 1: Risk distribution
risk_counts = gpa_summary[‘risk_category’].value_counts()
axs[0,0].bar(risk_counts.index, risk_counts.values)
axs[0,0].set_title(‘Student Risk Distribution’)
axs[0,0].set_ylabel(‘Number of Students’)
axs[0,0].tick_params(axis=’x’, rotation=45)
# Chart 2: Attendance vs GPA scatter
axs[0,1].scatter(gpa_summary[‘attendance_rate’], gpa_summary[‘gpa’],
alpha=0.5, s=50)
axs[0,1].set_xlabel(‘Attendance Rate (%)’)
axs[0,1].set_ylabel(‘GPA’)
axs[0,1].set_title(‘Attendance Rate vs GPA’)
axs[0,1].grid(alpha=0.3)
# Chart 3: GPA distribution
axs[1,0].hist(gpa_summary[‘gpa’], bins=20, edgecolor=’black’)
axs[1,0].set_xlabel(‘GPA’)
axs[1,0].set_ylabel(‘Number of Students’)
axs[1,0].set_title(‘GPA Distribution’)
axs[1,0].axvline(x=2.0, color=’r’, linestyle=’–‘, label=’At-Risk Threshold’)
axs[1,0].legend()
# Chart 4: Performance by economic status
axs[1,1].bar([‘Free Lunch’, ‘No Free Lunch’],
[econ_analysis[econ_analysis[‘free_lunch’]==True][‘avg_gpa’].values[0],
econ_analysis[econ_analysis[‘free_lunch’]==False][‘avg_gpa’].values[0]])
axs[1,1].set_ylabel(‘Average GPA’)
axs[1,1].set_title(‘Academic Performance by Economic Status’)
plt.tight_layout()
plt.show()
# Intervention recommendations
high_risk = gpa_summary[gpa_summary[‘risk_category’] == ‘High Risk’]
print(f”\n🚨 INTERVENTION NEEDED:”)
print(f” {len(high_risk)} students identified as high risk”)
print(f” Average GPA: {high_risk[‘gpa’].mean():.2f}”)
print(f” Average Attendance: {high_risk[‘attendance_rate’].mean():.1f}%”)
# Save analysis
result1 = ingest_to_dataverse(gpa_summary, “Education_Prod”, “student_risk_analysis”)
result2 = ingest_to_dataverse(high_risk, “Education_Prod”, “intervention_list”)
print(f”\n✅ Analysis saved!”)
print(“\n💡 Recommended Interventions:”)
print(” – Tutoring programs for low GPA students”)
print(” – Attendance monitoring and family outreach”)
print(” – Additional support for economically disadvantaged students”)
print(” – Early warning system for declining performance”)
📦 Bulk Data Tab – Complete Tutorial
Mass data ingestion with wizard mode, lookups, and multi-environment support
Manual/File Ingestion
Simple Upload & Ingest
For straightforward data without relationships:
STEP 1: Prepare JSON file
━━━━━━━━━━━━━━━━━━━━━━
[
{
“name”: “Contoso Ltd”,
“revenue”: 5000000,
“industrycode”: “Manufacturing”
},
{
“name”: “Fabrikam Inc”,
“revenue”: 3500000,
“industrycode”: “Technology”
}
]
STEP 2: In Bulk Data tab
━━━━━━━━━━━━━━━━━━━━━━
1. Select “Manual/File Ingestion” sub-tab
2. Choose Environment: “Production”
3. Choose Table: “account”
4. Upload JSON file OR paste JSON
5. Click “Start Bulk Ingestion”
6. Monitor progress (100 records per batch)
RESULT:
✅ Records created in Dataverse
📊 Success/error counts displayed
Wizard Mode (with Lookups)
Create Records with Parent-Child Relationships
SCENARIO: Import contacts linked to accounts
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
STEP 1: Prepare data with lookup keys
{
“contacts”: [
{
“firstname”: “John”,
“lastname”: “Doe”,
“emailaddress1”: “john.doe@contoso.com”,
“_account_key”: “Contoso Ltd” // Lookup key
},
{
“firstname”: “Jane”,
“lastname”: “Smith”,
“emailaddress1”: “jane@fabrikam.com”,
“_account_key”: “Fabrikam Inc”
}
]
}
STEP 2: Configure Wizard
━━━━━━━━━━━━━━━━━━━
1. Go to “Wizard Mode (Lookups)” sub-tab
2. Select target table: “contact”
3. Click “Configure Lookup Mapping”
STEP 3: Map Lookup Field
━━━━━━━━━━━━━━━━━━━
Lookup Field: parentcustomerid
Related Table: account
Match Key Field: name
My Data Key: _account_key
4. Click “Add Mapping”
STEP 4: Run Ingestion
━━━━━━━━━━━━━━━━━
5. Upload/paste JSON
6. Click “Start Wizard Ingestion”
RESULT:
✅ System automatically:
– Queries accounts by name
– Finds GUIDs
– Creates @odata.bind references
– Creates contacts with proper relationships
Save Configuration for Reuse
After configuring lookups:
1. Click “Save Configuration”
2. Name it: “Contact Import with Account Lookup”
3. Next time: Click “Load Configuration” → Select saved config
4. Upload new data → Run!
💡 Perfect for recurring imports!
Multi-Target Ingestion
Import to Multiple Tables/Environments Simultaneously
SCENARIO: Import customers to multiple regions
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
STEP 1: Create Import Groups
━━━━━━━━━━━━━━━━━━━━━━━
Group 1:
Environment: “North_America”
Table: “account”
Data: [NA customers JSON]
Group 2:
Environment: “Europe”
Table: “account”
Data: [EU customers JSON]
Group 3:
Environment: “Asia”
Table: “account”
Data: [Asia customers JSON]
STEP 2: Execute All Groups
━━━━━━━━━━━━━━━━━━━━━━━
Click “Run All Groups”
RESULT:
✅ All regions updated simultaneously
📊 Individual success metrics per group
Direct @odata.bind
For Advanced Users – Pre-formatted Lookups
When you already have GUIDs:
[
{
“firstname”: “John”,
“lastname”: “Doe”,
“emailaddress1”: “john@example.com”,
“parentcustomerid@odata.bind”: “/accounts(12345678-1234-1234-1234-123456789012)”
}
]
Use “Direct @odata.bind” sub-tab
No lookup resolution needed – direct creation!
🏭 INDUSTRY EXAMPLES
HEALTHCARE
Import Patient Appointments with Doctor Lookups
DATA STRUCTURE:
{
“appointments”: [
{
“msemr_appointmenttype”: “Consultation”,
“msemr_scheduledstart”: “2025-02-01T10:00:00Z”,
“msemr_scheduledend”: “2025-02-01T10:30:00Z”,
“_patient_key”: “P12345”, // Patient lookup
“_doctor_key”: “D789” // Doctor lookup
}
]
}
WIZARD CONFIGURATION:
━━━━━━━━━━━━━━━━━━━━
Target Table: msemr_appointment
Lookup 1:
Field: msemr_patient
Related Table: msemr_patient
Match On: msemr_patientnumber
Data Key: _patient_key
Lookup 2:
Field: msemr_practitioner
Related Table: msemr_practitioner
Match On: msemr_doctorcode
Data Key: _doctor_key
RESULT:
✅ 500 appointments imported with proper patient & doctor links
⏱️ 2 minutes total processing time
RETAIL
Import Product Catalog with Category Hierarchy
THREE-LEVEL HIERARCHY:
Category → Subcategory → Product
METHOD: Multi-Step Wizard
━━━━━━━━━━━━━━━━━━━━
STEP 1: Import Categories (no lookups)
{
“categories”: [
{“name”: “Electronics”, “code”: “ELEC”},
{“name”: “Clothing”, “code”: “CLTH”}
]
}
→ Manual Ingestion to “category” table
STEP 2: Import Subcategories (1 lookup)
{
“subcategories”: [
{“name”: “Laptops”, “_parent_cat”: “ELEC”},
{“name”: “Phones”, “_parent_cat”: “ELEC”}
]
}
→ Wizard Mode with category lookup
STEP 3: Import Products (2 lookups)
{
“products”: [
{
“name”: “Dell XPS 15”,
“price”: 1299.99,
“_category”: “ELEC”,
“_subcategory”: “Laptops”
}
]
}
→ Wizard Mode with category + subcategory lookups
RESULT:
✅ Complete catalog with proper hierarchy
📦 1000 products imported
🔗 All relationships preserved
REAL ESTATE
Import Property Listings with Agent & Office Lookups
DATA:
{
“listings”: [
{
“address”: “123 Main St”,
“city”: “Seattle”,
“price”: 750000,
“bedrooms”: 3,
“bathrooms”: 2,
“_agent_email”: “agent@realty.com”,
“_office_code”: “SEA001”
}
]
}
WIZARD SETUP:
━━━━━━━━━━━━━
Lookup 1:
Field: listing_agent
Related: contact
Match On: emailaddress1
Data Key: _agent_email
Lookup 2:
Field: listing_office
Related: office
Match On: office_code
Data Key: _office_code
RUN:
Upload 2000 listings → Process in 5 minutes
RESULT:
✅ All listings linked to correct agents & offices
📊 Success rate: 98% (40 failed due to invalid office codes)
🎯 Best Practices
- ✅ Test first: Import 5-10 records before bulk
- ✅ Save configs: Wizard configurations are reusable
- ✅ Batch size: Automatic 100 records per batch
- ✅ Error handling: Failed records are logged separately
- ✅ Staging first: Import to staging tables for validation
- ✅ Use keys wisely: Choose unique, stable fields for lookups
🔄 Data Deduplication – Complete Tutorial
Find and remove duplicate records before or after import
Matching Options
- Exact match on single field (email)
- Exact match on multiple fields (firstname + lastname)
- Fuzzy matching (similar names)
- Custom key combination
SCENARIO: Import 10,000 contacts, check for duplicates
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
STEP 1: Upload Data
10,000 contact records (CSV or JSON)
STEP 2: Select Dataverse Table
Table: contact
Environment: Production
STEP 3: Configure Deduplication
Match Field: emailaddress1
Method: Exact match
STEP 4: Run Check
Click “Check for Duplicates”
RESULTS:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
📊 Total records uploaded: 10,000
✅ New records (not in Dataverse): 7,234
❌ Duplicates (already in Dataverse): 2,766
STEP 5: Export
Download “New Records Only” → 7,234 clean records
→ Import these to avoid duplicates!
CRM CLEANUP
Merge 3 Legacy Databases
CHALLENGE: Combining 3 acquired companies’ customer lists
DATABASE A: 25,000 customers
DATABASE B: 18,000 customers
DATABASE C: 12,000 customers
TOTAL: 55,000 records
DEDUPLICATION PROCESS:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
1. Upload Database A → 25,000 records imported
2. Dedupe Database B against Dataverse
Result: 12,456 new, 5,544 duplicates
Import: 12,456 new records
3. Dedupe Database C against Dataverse
Result: 7,891 new, 4,109 duplicates
Import: 7,891 new records
FINAL RESULT:
✅ 45,347 unique customers (vs 55,000 raw)
❌ 9,653 duplicates prevented
💾 Saved: Hours of manual deduping
🔍 CRUD Operations – Complete Tutorial
Single-record Create, Read, Update, Delete operations
When to Use
- Quick single record lookups
- Manual data fixes
- Testing API calls
- One-off updates
CREATE – Add New Record
STEP 1: Select Environment & Table
Environment: Production
Table: account
STEP 2: Enter JSON Payload
{
“name”: “New Customer Corp”,
“revenue”: 2500000,
“industrycode”: “Technology”,
“emailaddress1”: “info@newcustomer.com”
}
STEP 3: Click “Create”
RESULT:
✅ Record created
📋 New GUID returned: 12345678-1234-1234-1234-123456789012
READ – Retrieve Record
STEP 1: Enter Record ID
GUID: 12345678-1234-1234-1234-123456789012
STEP 2: Click “Read”
RESULT:
{
“accountid”: “12345678-1234-1234-1234-123456789012”,
“name”: “New Customer Corp”,
“revenue”: 2500000,
“industrycode”: “Technology”,
“createdon”: “2025-01-26T15:30:00Z”
}
UPDATE – Modify Record
STEP 1: Enter Record ID
GUID: 12345678-1234-1234-1234-123456789012
STEP 2: Enter Update JSON (only fields to change)
{
“revenue”: 3000000,
“websiteurl”: “https://newcustomer.com”
}
STEP 3: Click “Update”
RESULT:
✅ Record updated
📝 Fields modified: 2
DELETE – Remove Record
STEP 1: Enter Record ID
GUID: 12345678-1234-1234-1234-123456789012
STEP 2: Click “Delete”
STEP 3: Confirm deletion
RESULT:
✅ Record deleted
⚠️ This action cannot be undone!