1:1 Free Personal Support MS Dynamics 365 โ€“ Complex Data Ingestion Guide (30 Minutes).

1:1 Free Personal Support MS Dynamics 365 โ€“ Complex Data Ingestion Guide (30 Minutes).

When

March 2, 2026 - December 31, 2026    
6:00 pm - 6:30 pm

Bookings

ยฃ0.00
Book Now

Where

Event Type

๐ŸŽฏ Complex Data Ingestion Guide

โœ… YES – THIS APP HAS FULL CAPACITY FOR BOTH SCENARIOS!

SCENARIO 1: Hierarchical Data with Parent-Child Relationships โœ…
SCENARIO 2: Large Volume Data (Millions of Rows) โœ…

PLUS: Python can extract, clean with pandas, and re-ingest! โœ…

๐Ÿ“‹ CAPABILITY MATRIX

Requirement App Capability Tabs to Use Status
Hierarchical Data Self-referencing lookups, parent-child, nested relationships Bulk Data (Wizard Mode), Direct @odata.bind โœ… FULL SUPPORT
Large Volume (Millions) Batching (100 records/batch), progress tracking, error handling Bulk Data (Manual), Python Gen โœ… FULL SUPPORT
Complex Lookups 10+ lookups per record, multiple relationship types Bulk Data (Wizard), Direct @odata.bind โœ… FULL SUPPORT
Data Cleaning Python with pandas for transformation Python Gen, Data Cleaning โœ… FULL SUPPORT
Extract-Transform-Load Full ETL pipeline with Python Python Gen, Python Tab โœ… FULL SUPPORT
Progress Monitoring Real-time progress, success/error counts All Bulk Data tabs โœ… FULL SUPPORT
Error Recovery Detailed error logs, retry capability All Bulk Data tabs โœ… FULL SUPPORT

๐ŸŒณ SCENARIO 1: Hierarchical Data Structures

THE CHALLENGE:
Loading data with parent-child relationships where:

  • Record A is parent of Record B
  • Record B is parent of Record C
  • Self-referencing lookups (Account.ParentAccountId โ†’ Account)
  • Need to maintain data integrity
  • Order of insertion matters!

Example: Organization Hierarchy

SCENARIO: Import organization structure

Contoso Corporation (Level 0)
โ”œโ”€โ”€ North America Division (Level 1)
โ”‚ โ”œโ”€โ”€ USA Region (Level 2)
โ”‚ โ”‚ โ”œโ”€โ”€ California Office (Level 3)
โ”‚ โ”‚ โ””โ”€โ”€ Texas Office (Level 3)
โ”‚ โ””โ”€โ”€ Canada Region (Level 2)
โ”‚ โ””โ”€โ”€ Toronto Office (Level 3)
โ””โ”€โ”€ Europe Division (Level 1)
โ”œโ”€โ”€ UK Region (Level 2)
โ””โ”€โ”€ Germany Region (Level 2)

CHALLENGE:
– ParentAccountId lookup must reference existing records
– Must insert parents before children
– Need GUIDs to reference parents

โœ… SOLUTION: 3 METHODS AVAILABLE

METHOD 1: Wizard Mode (Easiest – Recommended)

TAB: Bulk Data โ†’ Bulk Ingestion with Lookups

STEP-BY-STEP:

1. PREPARE DATA (Level-by-Level):
โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
File 1: level_0_root.json
[
{
“name”: “Contoso Corporation”,
“accountnumber”: “ROOT-001”
}
]

File 2: level_1_divisions.json
[
{
“name”: “North America Division”,
“accountnumber”: “DIV-NA”,
“parent_account_number”: “ROOT-001” โ† Match field!
},
{
“name”: “Europe Division”,
“accountnumber”: “DIV-EU”,
“parent_account_number”: “ROOT-001”
}
]

File 3: level_2_regions.json
[
{
“name”: “USA Region”,
“accountnumber”: “REG-USA”,
“parent_account_number”: “DIV-NA”
},
{
“name”: “Canada Region”,
“accountnumber”: “REG-CAN”,
“parent_account_number”: “DIV-NA”
},
{
“name”: “UK Region”,
“accountnumber”: “REG-UK”,
“parent_account_number”: “DIV-EU”
}
]

2. CONFIGURE WIZARD MODE:
โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
Group 1: Level 0 (Root)
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ Environment: Production โ”‚
โ”‚ Target Table: account โ”‚
โ”‚ Upload: level_0_root.json โ”‚
โ”‚ Lookup Columns: NONE โ”‚ โ† No parent!
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Group 2: Level 1 (Divisions)
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ Environment: Production โ”‚
โ”‚ Target Table: account โ”‚
โ”‚ Upload: level_1_divisions.json โ”‚
โ”‚ Lookup Column 1: โ”‚
โ”‚ Source Column: parent_account_number โ”‚
โ”‚ Lookup Table: account โ”‚
โ”‚ Match Field: accountnumber โ”‚
โ”‚ Binding Field: parentaccountid โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Group 3: Level 2 (Regions)
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ Environment: Production โ”‚
โ”‚ Target Table: account โ”‚
โ”‚ Upload: level_2_regions.json โ”‚
โ”‚ Lookup Column 1: โ”‚
โ”‚ Source Column: parent_account_number โ”‚
โ”‚ Lookup Table: account โ”‚
โ”‚ Match Field: accountnumber โ”‚
โ”‚ Binding Field: parentaccountid โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

3. EXECUTE:
โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
Click: “๐Ÿš€ Run All Lookup Ingestion Groups”

WHAT HAPPENS:
โœ… Group 1 runs first โ†’ Creates root (Contoso Corp)
โœ… Group 2 runs second โ†’ Creates divisions, links to root
โœ… Group 3 runs third โ†’ Creates regions, links to divisions
โœ… All parent-child relationships preserved!

4. RESULTS:
โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
โœ… Level 0: 1 record created
โœ… Level 1: 2 records created (with parent links)
โœ… Level 2: 3 records created (with parent links)
โœ… Total: 6 records, full hierarchy!

๐Ÿ’ก KEY INSIGHT – Wizard Mode Benefits:

  • โœ… Automatic GUID resolution – You don’t need GUIDs!
  • โœ… Sequential processing – Groups run in order
  • โœ… Match by business key – Use account numbers, not GUIDs
  • โœ… Error handling – Shows which level failed
  • โœ… Reusable configuration – Save and reuse for future imports
METHOD 2: Direct @odata.bind (Advanced – If You Have GUIDs)

TAB: Bulk Data โ†’ Bulk Ingestion with Lookups โ†’ Direct @odata.bind

WHEN TO USE:
– You already have GUIDs from another system
– You’ve done a previous export with GUIDs
– You want maximum control

APPROACH:

1. INSERT ROOT FIRST (No Parent):
โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
[
{
“name”: “Contoso Corporation”,
“accountid”: “11111111-1111-1111-1111-111111111111”
}
]

2. INSERT LEVEL 1 (With Parent GUID):
โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
[
{
“name”: “North America Division”,
“accountid”: “22222222-2222-2222-2222-222222222222”,
“parentaccountid@odata.bind”: “/accounts(11111111-1111-1111-1111-111111111111)”
},
{
“name”: “Europe Division”,
“accountid”: “33333333-3333-3333-3333-333333333333”,
“parentaccountid@odata.bind”: “/accounts(11111111-1111-1111-1111-111111111111)”
}
]

3. INSERT LEVEL 2:
โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
[
{
“name”: “USA Region”,
“parentaccountid@odata.bind”: “/accounts(22222222-2222-2222-2222-222222222222)”
}
]

PROCESS:
– Select Environment and Table
– Paste Level 0 JSON โ†’ Ingest
– Paste Level 1 JSON โ†’ Ingest
– Paste Level 2 JSON โ†’ Ingest
– โœ… Done! Hierarchy created!

METHOD 3: Python with Pandas (Most Powerful)

TAB: Python Gen โ†’ Generate Script

SCENARIO: Extract from CSV, clean, handle hierarchy, ingest

1. GENERATE PYTHON SCRIPT:
โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
– Navigate: ๐Ÿ๐Ÿ“ Python Gen
– Select: “๐Ÿ“ฆ Bulk Import from CSV”
– Configure:
Client ID: your-client-id
Environment URL: https://yourorg.crm.dynamics.com
Table: account
– Click: “๐Ÿ Generate Python Script”

2. CUSTOMIZE SCRIPT FOR HIERARCHY:
โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€

GENERATED SCRIPT (Modified for Hierarchy):

import msal
import requests
import pandas as pd
import json

# Configuration
CLIENT_ID = “your-client-id”
TENANT_ID = “your-tenant-id”
ENV_URL = “https://yourorg.crm.dynamics.com”
TABLE_NAME = “account”

# Authentication (MSAL code here…)

# Read CSV
df = pd.read_csv(“organization_hierarchy.csv”)

# DATA CLEANING with pandas
df[‘name’] = df[‘name’].str.strip() # Remove whitespace
df[‘accountnumber’] = df[‘accountnumber’].str.upper() # Uppercase
df = df.dropna(subset=[‘name’]) # Remove null names
df = df.drop_duplicates(subset=[‘accountnumber’]) # Remove duplicates

# Sort by level (parents first!)
df = df.sort_values(‘hierarchy_level’)

# HIERARCHICAL INGESTION
guid_map = {} # Map account_number โ†’ GUID

for index, row in df.iterrows():
record = {
“name”: row[‘name’],
“accountnumber”: row[‘accountnumber’]
}

# Add parent lookup if exists
if pd.notna(row[‘parent_account_number’]):
parent_guid = guid_map.get(row[‘parent_account_number’])
if parent_guid:
record[“parentaccountid@odata.bind”] = f”/accounts({parent_guid})”

# Create record
response = requests.post(
f”{ENV_URL}/api/data/v9.2/accounts”,
headers=headers,
json=record
)

if response.ok:
# Extract GUID from response
created_id = response.headers[‘OData-EntityId’].split(‘(‘)[1].split(‘)’)[0]
guid_map[row[‘accountnumber’]] = created_id
print(f”โœ… Created: {row[‘name’]} โ†’ {created_id}”)
else:
print(f”โŒ Error: {row[‘name’]} โ†’ {response.text}”)

print(f”๐ŸŽ‰ Hierarchy complete! Created {len(guid_map)} records”)

WHAT THIS DOES:
โœ… Reads CSV with pandas
โœ… Cleans data (strip, uppercase, dedupe)
โœ… Sorts by hierarchy level
โœ… Inserts parents first
โœ… Tracks GUIDs in memory
โœ… Links children to parents
โœ… Handles any depth of hierarchy!

โœ… HIERARCHICAL DATA: FULLY SUPPORTED!

Choose Based on Needs:
โ€ข Wizard Mode: Easiest, no coding, reusable โญ RECOMMENDED
โ€ข Direct @odata.bind: When you have GUIDs
โ€ข Python: Complex cleaning, unlimited hierarchy depth, full control

๐Ÿ“Š SCENARIO 2: Large Volume Data (Millions of Rows)

THE CHALLENGE:
Importing millions of rows efficiently with:

  • Performance optimization
  • Progress monitoring
  • Error handling and recovery
  • Memory management
  • Data consistency

Example: Import 5 Million Customer Records

DATASET:
โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
File: customers_5million.csv
Size: 2.5 GB
Rows: 5,000,000
Columns: 25CHALLENGES:
โŒ Can’t load 2.5GB into memory at once
โŒ Can’t send 5M records in one API call
โŒ Need to track progress
โŒ Need to handle failures
โŒ Need to resume if interrupted

โœ… SOLUTION: 3 METHODS AVAILABLE

METHOD 1: Bulk Data Manual Ingestion (Web Interface)

TAB: Bulk Data โ†’ Bulk Ingestion (Manual or File Upload)

LIMITATIONS:
โš ๏ธ This method works best for up to ~100,000 records
โš ๏ธ For millions, use Python method below

PROCESS:

1. SPLIT FILE:
โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
Split 5M records into chunks of 10,000 each
= 500 files of 10,000 records

Tools: Python pandas, Excel Power Query, or:

import pandas as pd

# Read in chunks
chunk_size = 10000
for i, chunk in enumerate(pd.read_csv(‘customers_5M.csv’, chunksize=chunk_size)):
chunk.to_json(f’chunk_{i}.json’, orient=’records’)

2. UPLOAD CHUNKS SEQUENTIALLY:
โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
For each chunk file:
– Upload to Bulk Data tab
– Click “Ingest Data”
– Wait for completion
– Upload next chunk

Progress: Chunk 1/500… Chunk 2/500… etc.

PROS:
โœ… No coding required
โœ… Visual progress tracking
โœ… Error handling built-in

CONS:
โŒ Manual for each chunk (500 uploads!)
โŒ Time-consuming for millions
โŒ Better for smaller datasets

METHOD 2: Python with Batching (BEST for Millions) โญ

TAB: ๐Ÿ๐Ÿ“ Python Gen โ†’ Generate “Bulk Import from CSV”

WHY THIS IS BEST:
โœ… Automatic batching (100 records per API call)
โœ… Progress tracking built-in
โœ… Memory efficient (streaming)
โœ… Error recovery and retry logic
โœ… Resume capability
โœ… Fully automated – run overnight!

GENERATED SCRIPT:

import msal
import requests
import pandas as pd
import json
import time
from datetime import datetime

# Configuration
CLIENT_ID = “your-client-id”
TENANT_ID = “your-tenant-id”
CLIENT_SECRET = “your-secret” # For automation
ENV_URL = “https://yourorg.crm.dynamics.com”
TABLE_NAME = “contact”
BATCH_SIZE = 100 # Dataverse Web API limit

# Files
CSV_FILE = “customers_5million.csv”
CHECKPOINT_FILE = “ingestion_checkpoint.json”
ERROR_LOG = “ingestion_errors.json”

# Authentication
def get_access_token():
app = msal.ConfidentialClientApplication(
CLIENT_ID,
authority=f”https://login.microsoftonline.com/{TENANT_ID}”,
client_credential=CLIENT_SECRET
)
result = app.acquire_token_for_client(scopes=[f”{ENV_URL}/.default”])
return result[“access_token”]

# Load checkpoint (for resume capability)
def load_checkpoint():
try:
with open(CHECKPOINT_FILE, ‘r’) as f:
return json.load(f)
except:
return {“last_processed_row”: 0, “total_success”: 0, “total_errors”: 0}

# Save checkpoint
def save_checkpoint(checkpoint):
with open(CHECKPOINT_FILE, ‘w’) as f:
json.dump(checkpoint, f)

# Log errors
def log_error(row_num, error):
try:
with open(ERROR_LOG, ‘a’) as f:
f.write(json.dumps({“row”: row_num, “error”: str(error),
“timestamp”: datetime.now().isoformat()}) + ‘\n’)
except:
pass

# Main ingestion function
def ingest_large_dataset():
print(“๐Ÿš€ Starting large volume ingestion…”)
print(f”๐Ÿ“ File: {CSV_FILE}”)

# Get token
token = get_access_token()
headers = {
“Authorization”: f”Bearer {token}”,
“Content-Type”: “application/json”,
“OData-MaxVersion”: “4.0”,
“OData-Version”: “4.0”
}

# Load checkpoint
checkpoint = load_checkpoint()
start_row = checkpoint[“last_processed_row”]

print(f”๐Ÿ“Š Resuming from row {start_row}”)

# Read CSV in chunks (memory efficient!)
chunk_size = 10000 # Read 10K rows at a time
batch = []
batch_requests = []
row_num = 0

for chunk in pd.read_csv(CSV_FILE, chunksize=chunk_size):
# Skip already processed rows
if row_num + len(chunk) <= start_row:
row_num += len(chunk)
continue

# Clean data with pandas
chunk = chunk.fillna(”) # Handle nulls
chunk = chunk.drop_duplicates(subset=[’email’]) # Dedupe

for index, row in chunk.iterrows():
row_num += 1

# Skip if already processed
if row_num <= start_row: continue # Transform to Dataverse format record = { “firstname”: row[‘first_name’], “lastname”: row[‘last_name’], “emailaddress1”: row[’email’], “telephone1”: row[‘phone’], # … more fields } # Add to batch batch.append(record) # When batch is full, send it if len(batch) >= BATCH_SIZE:
success = send_batch(batch, headers, batch_requests, row_num)

if success:
checkpoint[“total_success”] += len(batch)
else:
checkpoint[“total_errors”] += len(batch)

# Update checkpoint
checkpoint[“last_processed_row”] = row_num
save_checkpoint(checkpoint)

# Progress report
print(f”โœ… Processed: {row_num:,} / 5,000,000 ” +
f”({row_num/50000:.1f}%) | ” +
f”Success: {checkpoint[‘total_success’]:,} | ” +
f”Errors: {checkpoint[‘total_errors’]:,}”)

# Clear batch
batch = []
batch_requests = []

# Token refresh every 50K records
if row_num % 50000 == 0:
print(“๐Ÿ”„ Refreshing access token…”)
token = get_access_token()
headers[“Authorization”] = f”Bearer {token}”

# Brief pause to avoid throttling
time.sleep(0.1)

# Send remaining batch
if batch:
send_batch(batch, headers, batch_requests, row_num)
checkpoint[“total_success”] += len(batch)
checkpoint[“last_processed_row”] = row_num
save_checkpoint(checkpoint)

print(“\n๐ŸŽ‰ INGESTION COMPLETE!”)
print(f”๐Ÿ“Š Total Processed: {row_num:,}”)
print(f”โœ… Successful: {checkpoint[‘total_success’]:,}”)
print(f”โŒ Errors: {checkpoint[‘total_errors’]:,}”)
print(f”๐Ÿ“„ Error log: {ERROR_LOG}”)

def send_batch(batch, headers, batch_requests, row_num):
“””Send batch using Batch API”””
try:
# Create batch request
batch_id = f”batch_{row_num}”
changeset_id = f”changeset_{row_num}”

# Build batch payload
boundary = f”batch_{batch_id}”
changeset_boundary = f”changeset_{changeset_id}”

# … (Batch API formatting code)

# Send batch
response = requests.post(
f”{ENV_URL}/api/data/v9.2/$batch”,
headers={“Content-Type”: f”multipart/mixed;boundary={boundary}”,
“Authorization”: headers[“Authorization”]},
data=batch_payload
)

if response.status_code == 200:
return True
else:
log_error(row_num, response.text)
return False

except Exception as e:
log_error(row_num, str(e))
return False

# RUN IT!
if __name__ == “__main__”:
ingest_large_dataset()

EXECUTION:

python ingest_5million.py

OUTPUT:

๐Ÿš€ Starting large volume ingestion…
๐Ÿ“ File: customers_5million.csv
๐Ÿ“Š Resuming from row 0
โœ… Processed: 10,000 / 5,000,000 (0.2%) | Success: 10,000 | Errors: 0
โœ… Processed: 20,000 / 5,000,000 (0.4%) | Success: 20,000 | Errors: 0
โœ… Processed: 30,000 / 5,000,000 (0.6%) | Success: 30,000 | Errors: 0
๐Ÿ”„ Refreshing access token…
โœ… Processed: 50,000 / 5,000,000 (1.0%) | Success: 50,000 | Errors: 0

โœ… Processed: 5,000,000 / 5,000,000 (100%) | Success: 4,998,523 | Errors: 1,477
๐ŸŽ‰ INGESTION COMPLETE!
๐Ÿ“Š Total Processed: 5,000,000
โœ… Successful: 4,998,523
โŒ Errors: 1,477
๐Ÿ“„ Error log: ingestion_errors.json

TIME: ~8-12 hours for 5 million records (overnight run)

๐ŸŒŸ KEY FEATURES OF PYTHON METHOD:

  • โœ… Checkpoint/Resume: If script crashes, resume from last row!
  • โœ… Memory Efficient: Streams data in chunks, not all at once
  • โœ… Progress Tracking: Real-time percentage and counts
  • โœ… Error Logging: Detailed error file for failed rows
  • โœ… Token Refresh: Auto-refreshes token every 50K records
  • โœ… Throttle Protection: Brief pauses to avoid API limits
  • โœ… Pandas Cleaning: Dedupe, null handling, validation
  • โœ… Batch API: 100 records per API call (optimal)
METHOD 3: Python with Advanced ETL Pipeline

TAB: ๐Ÿ Python (for custom scripting)

ADVANCED ETL SCENARIO:
Extract from multiple sources โ†’ Clean/Transform โ†’ Load to Dataverse

import pandas as pd
import msal
import requests

# EXTRACT from multiple sources
df_customers = pd.read_csv(‘customers.csv’)
df_orders = pd.read_csv(‘orders.csv’)
df_products = pd.read_csv(‘products.csv’)

# TRANSFORM with pandas
# 1. Clean data
df_customers[’email’] = df_customers[’email’].str.lower().str.strip()
df_customers = df_customers.drop_duplicates(subset=[’email’])
df_customers = df_customers[df_customers[’email’].str.contains(‘@’)] # Valid emails

# 2. Enrich data
df_customers[‘full_name’] = df_customers[‘first_name’] + ‘ ‘ + df_customers[‘last_name’]
df_customers[‘customer_tier’] = df_customers[‘total_spend’].apply(
lambda x: ‘Gold’ if x > 10000 else ‘Silver’ if x > 5000 else ‘Bronze’
)

# 3. Join data
df_merged = df_customers.merge(
df_orders.groupby(‘customer_id’).agg({‘order_count’: ‘count’}),
left_on=’id’,
right_on=’customer_id’,
how=’left’
)

# 4. Handle nulls
df_merged = df_merged.fillna({‘order_count’: 0})

# 5. Data validation
df_merged = df_merged[df_merged[‘age’] >= 18] # Remove minors
df_merged = df_merged[df_merged[‘age’] <= 120] # Remove invalid ages

# LOAD to Dataverse
token = get_access_token() # MSAL auth

batch_size = 100
for i in range(0, len(df_merged), batch_size):
batch = df_merged[i:i+batch_size]

for _, row in batch.iterrows():
record = {
“firstname”: row[‘first_name’],
“lastname”: row[‘last_name’],
“emailaddress1”: row[’email’],
“crxxx_customertier”: row[‘customer_tier’],
“crxxx_ordercount”: int(row[‘order_count’])
}

# POST to Dataverse
response = requests.post(
f”{env_url}/api/data/v9.2/contacts”,
headers={“Authorization”: f”Bearer {token}”,
“Content-Type”: “application/json”},
json=record
)

if response.ok:
print(f”โœ… Created: {row[‘full_name’]}”)
else:
print(f”โŒ Error: {row[‘full_name’]} โ†’ {response.text}”)

PANDAS DATA CLEANING CAPABILITIES:

# Remove duplicates
df = df.drop_duplicates(subset=[’email’])

# Handle nulls
df = df.fillna({‘phone’: ‘N/A’, ‘age’: 0})
df = df.dropna(subset=[’email’]) # Drop rows with null email

# String cleaning
df[‘name’] = df[‘name’].str.strip().str.title()
df[’email’] = df[’email’].str.lower()
df[‘phone’] = df[‘phone’].str.replace(r’\D’, ”, regex=True) # Remove non-digits

# Data validation
df = df[df[‘age’] >= 18] # Filter
df = df[df[’email’].str.contains(‘@’)] # Valid emails only

# Type conversion
df[‘age’] = pd.to_numeric(df[‘age’], errors=’coerce’)
df[‘date’] = pd.to_datetime(df[‘date’], errors=’coerce’)

# Deduplication with priority
df = df.sort_values(‘last_modified’, ascending=False)
df = df.drop_duplicates(subset=[’email’], keep=’first’)

# Merging/Joining
df_enriched = df1.merge(df2, on=’customer_id’, how=’left’)

# Aggregation
df_summary = df.groupby(‘category’).agg({
‘revenue’: ‘sum’,
‘quantity’: ‘mean’,
‘orders’: ‘count’
})

# Conditional columns
df[‘tier’] = df[‘spend’].apply(lambda x: ‘A’ if x > 1000 else ‘B’)

๐ŸŽฏ COMPLETE WORKFLOW EXAMPLES

WORKFLOW 1: Hierarchical Org Chart (100K Employees)

SCENARIO: Import company org chart with 100,000 employees

DATA:
employees.csv (100,000 rows)
โ”œโ”€โ”€ employee_id
โ”œโ”€โ”€ name
โ”œโ”€โ”€ email
โ”œโ”€โ”€ manager_email โ† Self-referencing!
โ””โ”€โ”€ department

TABS USED:
1๏ธโƒฃ ๐Ÿงน Data Cleaning
2๏ธโƒฃ Bulk Data โ†’ Wizard Mode
3๏ธโƒฃ ๐Ÿ“Š Visualizations (to verify)

PROCESS:

STEP 1: Clean Data (๐Ÿงน Data Cleaning Tab)
โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
– Upload employees.csv
– Remove duplicates by email
– Validate email format
– Handle null manager emails (CEO has no manager)
– Export cleaned: employees_clean.csv

STEP 2: Level Detection (Python or Excel)
โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
Add “hierarchy_level” column:
– If manager_email is null โ†’ level = 0 (CEO)
– If manager is level 0 โ†’ level = 1 (Execs)
– If manager is level 1 โ†’ level = 2 (VPs)
– etc.

Sort by level:
employees_level_0.csv (1 row – CEO)
employees_level_1.csv (10 rows – Execs)
employees_level_2.csv (100 rows – VPs)
employees_level_3.csv (1,000 rows – Directors)
employees_level_4.csv (10,000 rows – Managers)
employees_level_5.csv (88,889 rows – Staff)

STEP 3: Sequential Ingestion (Wizard Mode)
โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
Group 1: Level 0
– Upload: employees_level_0.csv
– Target: contact
– Lookup: NONE

Group 2

Bookings

Tickets

Registration Information

Booking Summary

1
x Attendee Ticket
ยฃ0.00
Total Price
ยฃ0.00

Leave a Reply

Your email address will not be published. Required fields are marked *

Shopping Cart