📚 Chapters
KPMG Data Engineering Interview Q & A
✍️ By MONU SINGH | 11/18/2025
Here is the 8 important Questions and answers asked in KPMG Data Engineering Interview 2019
1. How to
create and deploy notebooks in Databricks?
Answer:
Creating a
notebook:
1. Log in to
Azure Databricks.
2. Click on Workspace
> Users > your email.
3. Click Create
> Notebook.
4. Name the
notebook, choose a default language (Python, SQL, Scala, etc.), and attach a
cluster.
Deploying
notebooks:
• Manual
execution: Run the notebook interactively.
• Scheduled
job: Convert the notebook into a job and set a schedule.
• CI/CD
deployment: Use Git (e.g., Azure DevOps) to store notebooks and deploy
using Databricks CLI or REST API in pipelines.
2. What are
the best practices for data archiving and retention in Azure?
Answer:
• Use Azure
Data Lake/Blob for long-term storage.
• Apply
Lifecycle Management Policies:
o Move old
data to cool/archive tiers based on age.
o
Automatically delete data after retention period ends.
• Tag data with
metadata for classification (e.g., creation date).
• Encrypt
and secure archived data (use CMK if needed).
• Monitor
access and compliance using Azure Monitor/Azure Purview.
• Document and
automate retention policies in your data governance strategy.
3. How do
you connect ADLS (Azure Data Lake Storage) to Databricks?
Answer:
You can
connect using OAuth (Service Principal) or Access Key:
Mounting
ADLS Gen2 to Databricks
configs = {
"fs.azure.account.auth.type": "OAuth",
"fs.azure.account.oauth.provider.type":
"org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider",
"fs.azure.account.oauth2.client.id":
"<client-id>",
"fs.azure.account.oauth2.client.secret":
"<client-secret>",
"fs.azure.account.oauth2.client.endpoint":
"https://login.microsoftonline.com/<tenantid>/oauth2/token"
dbutils.fs.mount( source = "abfss://<container name>@<storageaccount>.dfs.core.windows.net/", mount_point = "/mnt/mydata", extra_configs = configs)
4. Write a
SQL query to list all employees who joined in the last 6 months.
Answer:
SELECT *
FROM employees
WHERE
join_date >= DATEADD(MONTH, -6, GETDATE());
Adjust
function names (GETDATE() or CURRENT_DATE) depending on SQL dialect (SQL
Server, MySQL, etc.)
5. How do
you implement data validation and quality checks in ADF?
Answer:
• Use Data
Flow or Stored Procedure activities.
• Perform:
. Null
checks, Data type checks, Range checks, etc.
• Create Validation
activities with expressions (e.g., row count > 0).
• Add If Condition
or Until Loop for conditional logic.
• Log
validation results to a control table or send alerts.
• Use Custom
Logging Framework in ADF for monitoring.
6. Explain
the concept of Azure Data Lake and its integration with SQL-based systems.
Answer:
Azure Data
Lake (ADLS) is a
scalable, secure storage for big data.
Integration
with SQL systems:
• Use PolyBase
or OPENROWSET in Azure Synapse to query ADLS files.
• External
Tables in Synapse map to files in ADLS.
• Use ADF to
move/transform data between ADLS and SQL DBs.
• Databricks
and Azure SQL DB can integrate for both ETL and analytics workloads.
This hybrid integration enables flexible, scalable, and cost-effective data
architecture.
7. How do you handle exceptions and errors in Python?
Answer:
Use try-except-finally
blocks:
try:
# risky code x
= 10 / 0
except
ZeroDivisionError as e:
print(f"Error
occurred: {e}") except
Exception as
e:
print(f"Unexpected
error: {e}") finally:
print("Cleanup
actions if needed.")
• Use logging
for error logging instead of print.
• You can also
raise custom exceptions using raise.
8. What is
the process of normalization, and why is it required?
Answer:
Normalization
is organizing data
to reduce redundancy and improve integrity. Steps (Normal Forms):
• 1NF:
Remove repeating groups.
• 2NF:
Remove partial dependency.
• 3NF:
Remove transitive dependency.
Why it's
required:
• Reduces data
redundancy.
• Improves
consistency and integrity.
• Makes data
easier to maintain.
However, in
analytics workloads, denormalization is preferred for performance.
💬 Comments
Comments (0)
No comments yet. Be the first to share your thoughts!