Loading ...

📚 Chapters

EXL Data Engineering Interview Q & A

✍️ By MONU SINGH | 11/18/2025


  Here is the 8 important Questions and answers asked in EXL Data Engineering Interview 2020

 

 

1. Difference between TRUNCATE and

DELETE in SQL

Answer:

Feature DELETE TRUNCATE

Operation Type DML (Data Manipulation Language) DDL (Data Definition Language)

WHERE Clause Can use WHERE to delete specific rows Cannot filter rows, deletes all Logging Fully logged Minimal logging

Rollback Can be rolled back Can be rolled back (in most RDBMS) Triggers Activates triggers Does not activate triggers Identity Reset Doesn’t reset identity column Resets identity column (in some DBs) Speed Slower for large data Faster for large data

 

 

 

2. How do you handle big data processing using Azure HDInsight?

 

Answer:

Azure HDInsight is a cloud-based service for open-source analytics frameworks like Hadoop, Spark, Hive, etc.

Steps for handling big data

 Cluster Selection: Choose Spark/Hadoop cluster type depending on workload. • Data Storage: Use ADLS or Blob Storage for input/output.

Data Ingestion: Use tools like Apache Sqoop, Kafka, or ADF.

• Processing:

*Use Spark jobs for in-memory distributed processing. o Use Hive or Pig for SQLbased ETL.

Optimization: Use partitioning, caching, compression, and YARN tuning.
 •
Monitoring: Use Ambari for cluster monitoring, logs, and performance tuning.

 

 

 

3. How to implement parallel copies in ADF using partitioning?

 

Answer:

You can implement parallel copy using Source partitioning in Copy Activity. Steps:

1. In ADF Copy Activity:
 Go to Source tab → Enable "Enable partitioning".
2. Set partition option:

. Dynamic Range: Provide column and range values (e.g., Date, ID).

. Static Range: Predefine ranges.

3. Set degree of parallelism (default is 4).

This breaks the data into slices and copies in parallel, improving performance.

 

 

 

4. Write Python code to replace vowels in a string with spaces.

 

Answer:

def replace_vowels_with_space(s):

vowels = 'aeiouAEIOU' return ''.join(' ' if char in

vowels else char for char in s)

# Example print(replace_vowels_with_space("Data

Engineer"))

# Output: "D t Eng n r"

 

 

 

5. How do you implement data encryption at rest and in transit in ADLS?

 

Answer:

Encryption at Rest:

• Enabled by default in ADLS using Azure Storage Service Encryption (SSE).
 • You can use:

. Microsoft-managed keys (default).

. Customer-managed keys (CMK) stored in Azure Key Vault.

Encryption in Transit:

• Enforced using HTTPS.

• Private Endpoints and VPNs help avoid public internet exposure.

Advanced security: Enable Secure Transfer Required, use firewalls and VNet integration.

 

 

 

 

6. Describe the use of Azure Synapse Analytics and how it integrates with other Azure services.

Answer:

 Azure Synapse is an integrated analytics platform combining big data and data warehousing.
  Key Uses:

• Data ingestion, preparation, management, and visualization.

• Run T-SQL queries on both structured and unstructured data.

• Real-time analytics with Spark & SQL engines.

Integration:

Azure Data Lake: Store raw and curated data.

ADF: Pipelines for data orchestration.

Power BI: For dashboarding directly from Synapse.

Azure ML: For machine learning model training.

Azure DevOps: For CI/CD automation.

 

 

 

7. How do you implement continuous integration and continuous deployment (CI/CD) in Azure DevOps?

 

Answer:

Steps:

1. Source Code Management:

. Store ADF/Synapse code in Azure Repos or GitHub.

2. CI Pipeline:

. Trigger on code push. o Run validation tests or linting.

. Package ARM templates / JSONs for deployment.

3. CD Pipeline:

. Deploy artifacts to target environment (dev/test/prod). o Use az CLI,

PowerShell, or built-in deployment tasks.

4. Approvals:

. Add stage approvals for production.

. Use environment-specific variables.

 

 

 

8. Explain the role of metadata in data modeling and data architecture.

 

Answer:

Metadata is data about data.

In Data Modeling:

• Describes data structure (columns, data types, constraints).

• Defines relationships between tables (PK, FK).

• Helps tools understand schema and validation rules.

In Data Architecture:

• Tracks data lineage and provenance.

• Facilitates governance, cataloging, and compliance.

• Used in tools like Azure Purview, Databricks Unity Catalog.

It acts as a blueprint that improves discoverability, quality, and trust in data systems.

 

💬 Comments

logo

Comments (0)

No comments yet. Be the first to share your thoughts!