📚 Chapters
EXL Data Engineering Interview Q & A
✍️ By MONU SINGH | 11/18/2025
Here is the 8 important Questions and answers asked in EXL Data Engineering Interview 2020
1.
Difference between TRUNCATE and
DELETE in
SQL
Answer:
Feature
DELETE TRUNCATE
Operation Type
DML (Data Manipulation Language) DDL (Data Definition Language)
WHERE Clause
Can use WHERE to delete specific rows Cannot filter rows, deletes all Logging
Fully logged Minimal logging
Rollback Can
be rolled back Can be rolled back (in most RDBMS) Triggers Activates triggers
Does not activate triggers Identity Reset Doesn’t reset identity column Resets
identity column (in some DBs) Speed Slower for large data Faster for large data
2. How do
you handle big data processing using Azure HDInsight?
Answer:
Azure
HDInsight is a cloud-based service for open-source analytics frameworks like
Hadoop, Spark, Hive, etc.
Steps for
handling big data
• Cluster Selection: Choose Spark/Hadoop cluster type depending on workload. • Data Storage: Use ADLS or Blob Storage for input/output.
• Data
Ingestion: Use tools like Apache Sqoop, Kafka, or ADF.
• Processing:
*Use Spark
jobs for in-memory distributed processing. o Use Hive or Pig for SQLbased ETL.
• Optimization:
Use partitioning, caching, compression, and YARN tuning.
• Monitoring:
Use Ambari for cluster monitoring, logs, and performance tuning.
3. How to
implement parallel copies in ADF using partitioning?
Answer:
You can
implement parallel copy using Source partitioning in Copy Activity. Steps:
1. In ADF Copy
Activity:
Go to Source tab → Enable "Enable partitioning".
2. Set partition option:
. Dynamic
Range: Provide column and range values (e.g., Date, ID).
. Static
Range: Predefine ranges.
3. Set degree
of parallelism (default is 4).
This breaks
the data into slices and copies in parallel, improving performance.
4. Write
Python code to replace vowels in a string with spaces.
Answer:
def
replace_vowels_with_space(s):
vowels =
'aeiouAEIOU' return ''.join(' ' if char in
vowels else
char for char in s)
# Example
print(replace_vowels_with_space("Data
Engineer"))
# Output:
"D t Eng n r"
5. How do
you implement data encryption at rest and in transit in ADLS?
Answer:
Encryption
at Rest:
• Enabled by
default in ADLS using Azure Storage Service Encryption (SSE).
• You can use:
. Microsoft-managed
keys (default).
. Customer-managed
keys (CMK) stored in Azure Key Vault.
Encryption
in Transit:
• Enforced
using HTTPS.
• Private
Endpoints and VPNs help avoid public internet exposure.
Advanced
security: Enable Secure
Transfer Required, use firewalls and VNet integration.
6. Describe the use of Azure Synapse Analytics and how
it integrates with other Azure services.
Answer:
Azure Synapse is an integrated
analytics platform combining big data and data warehousing.
Key Uses:
• Data
ingestion, preparation, management, and visualization.
• Run T-SQL
queries on both structured and unstructured data.
• Real-time
analytics with Spark & SQL engines.
Integration:
• Azure
Data Lake: Store raw and curated data.
• ADF:
Pipelines for data orchestration.
• Power BI:
For dashboarding directly from Synapse.
• Azure ML:
For machine learning model training.
• Azure
DevOps: For CI/CD automation.
7. How do
you implement continuous integration and continuous deployment (CI/CD) in Azure
DevOps?
Answer:
Steps:
1. Source
Code Management:
. Store
ADF/Synapse code in Azure Repos or GitHub.
2. CI
Pipeline:
. Trigger on
code push. o Run validation tests or linting.
. Package ARM
templates / JSONs for deployment.
3. CD
Pipeline:
. Deploy
artifacts to target environment (dev/test/prod). o Use az CLI,
PowerShell, or
built-in deployment tasks.
4. Approvals:
. Add stage
approvals for production.
. Use
environment-specific variables.
8. Explain
the role of metadata in data modeling and data architecture.
Answer:
Metadata is data about data.
In Data
Modeling:
• Describes
data structure (columns, data types, constraints).
• Defines
relationships between tables (PK, FK).
• Helps tools understand schema and validation rules.
In Data
Architecture:
• Tracks data
lineage and provenance.
• Facilitates governance,
cataloging, and compliance.
• Used in
tools like Azure Purview, Databricks Unity Catalog.
It acts as a blueprint
that improves discoverability, quality, and trust in data systems.
💬 Comments
Comments (0)
No comments yet. Be the first to share your thoughts!