TCS, LatentView & EXL Data Engineering Interview Questions – Azure, PySpark, SQL & DevOps Explained

Here is the 8 important Questions and answers asked in TCS Data Engineering Interview 2020

1. How many jobs, stages, and tasks are created during a Spark job execution?

Answer:

In Spark:

• Job: Triggered by an action (e.g., collect, count, save, etc.).

• Stage: A job is divided into stages based on wide transformations (shuffle boundaries).
• Task: A stage is divided into tasks, with each task processing a partition of data. Example: If a Spark job reads data, performs a groupBy, and writes it:

• It could be 1 job,

• Split into 2 stages (before and after the shuffle),

• With N tasks (equal to number of partitions in each stage).

2. What are the activities in ADF (e.g., Copy Activity, Notebook Activity)?

Answer:

Common ADF Activities include:

• Copy Activity: Transfers data between source and sink.

• Data Flow Activity: Transforms data using Mapping Data Flows (visual ETL).
• Notebook Activity: Executes Databricks notebooks

• Stored Procedure Activity: Runs stored procedures in SQL DB.

• Web Activity: Calls a REST endpoint.

• Lookup/Get Metadata/Set Variable: Used for control flow.

Each activity serves a unique role in orchestrating end-to-end pipelines.

3. How do you integrate ADLS with Azure Databricks for data processing?

Answer:

Integration steps:

1. Mount ADLS to Databricks using:

o SAS token o Service principal (recommended with

Azure Key Vault)

configs = {

"fs.azure.account.auth.type": "OAuth",

"fs.azure.account.oauth.provider.type":

"org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider",

"fs.azure.account.oauth2.client.id": "<client-id>",

"fs.azure.account.oauth2.client.secret": dbutils.secrets.get(scope="kv-scope", key="client-secret"),

"fs.azure.account.oauth2.client.endpoint":

"https://login.microsoftonline.com/<tenantid>/oauth2/token"}

dbutils.fs.mount( source = "abfss://<container name>@<storageaccount>.dfs.core.windows.net/", mount_point = "/mnt/datalake", extra_configs = configs)

2. Read/write from/to /mnt/datalake/... like any file system.

4. Write Python code to check if a string is a palindrome.

Answer:

def is_palindrome(s):

s = s.lower().replace(" ", "")

return s == s[::-1]

print(is_palindrome("Madam")) #

True

5. How do you implement data governance in a data lake environment?

Answer:

Key components of Data Governance in a data lake

• Data Cataloging: Use Azure Purview or Microsoft Purview for metadata management.
• Access Control: Apply RBAC + ACLs on ADLS Gen2.

Data Classification: Tag sensitive data for security.

• Data Lineage: Track data transformations using Purview or Databricks lineage.
• Auditing: Enable logging for all access and changes.

• Schema Enforcement: Use Delta Lake to maintain schema.

6. Explain the differences between Azure SQL Database and Azure SQL Managed Instance.

Answer:

Feature Azure SQL Database Azure SQL Managed Instance Managed by Azure Yes Yes

SQL Server compatibility Partial Full (including SQL Agent, SSIS)

VNET support Limited Full VNET integration

Use cases Cloud-first, modern apps Lift-and-shift legacy apps Managed Instance is closer to full SQL Server, great for migration.

7. How do you monitor and troubleshoot issues in Azure SQL Database?

Answer:

• Monitoring Tools:

•Query Performance Insight: Shows resource-consuming queries.

•SQL Auditing: Captures DB activities.

• Azure Monitor / Log Analytics: Tracks performance and issues.

DMVs (Dynamic Management Views): For real-time diagnostics.

• Troubleshooting Tips:

Use sys.dm_exec_requests, sys.dm_exec_query_stats o Analyze

deadlocks, long-running queries o Set alerts on CPU/memory

utilization

8. Describe the process of data ingestion in Azure Synapse.

Answer:

Ingestion in Synapse includes:

• Using COPY INTO from external sources like ADLS, Blob, or external tables.
• Synapse Pipelines (ADF-like) to orchestrate batch loads.

• Linked Services for connection.

• Integration with PolyBase for fast bulk loading.

• Streaming ingestion using Event Hubs or Kafka with Spark pools.

Ingested data lands in dedicated SQL pools, serverless SQL, or Spark tables, depending on the architecture.

📚 Chapters

TCS Data Engineering interview questions2020

💬 Comments

Comments (0)