📚 Chapters
TCS Data Engineering interview questions2020
✍️ By MONU SINGH | 11/18/2025
Here is the 8 important Questions and answers asked in TCS Data Engineering Interview 2020
1. How many
jobs, stages, and tasks are created during a Spark job execution?
Answer:
In Spark:
• Job:
Triggered by an action (e.g., collect, count, save, etc.).
• Stage:
A job is divided into stages based on wide transformations (shuffle
boundaries).
• Task: A stage is divided into tasks, with each task processing
a partition of data. Example: If a Spark job reads data, performs a
groupBy, and writes it:
• It could be
1 job,
• Split into 2
stages (before and after the shuffle),
• With N tasks
(equal to number of partitions in each stage).
2. What are
the activities in ADF (e.g., Copy Activity, Notebook Activity)?
Answer:
Common ADF
Activities include:
• Copy
Activity: Transfers data between source and sink.
• Data Flow
Activity: Transforms data using Mapping Data Flows (visual ETL).
• Notebook Activity: Executes Databricks notebooks
• Stored Procedure Activity: Runs stored procedures in SQL DB.
• Web
Activity: Calls a REST endpoint.
• Lookup/Get
Metadata/Set Variable: Used for control flow.
Each activity
serves a unique role in orchestrating end-to-end pipelines.
3. How do
you integrate ADLS with Azure Databricks for data processing?
Answer:
Integration
steps:
1. Mount
ADLS to Databricks using:
o SAS token o
Service principal (recommended with
Azure Key
Vault)
configs = {
"fs.azure.account.auth.type":
"OAuth",
"fs.azure.account.oauth.provider.type":
"org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider",
"fs.azure.account.oauth2.client.id":
"<client-id>",
"fs.azure.account.oauth2.client.secret":
dbutils.secrets.get(scope="kv-scope", key="client-secret"),
"fs.azure.account.oauth2.client.endpoint":
"https://login.microsoftonline.com/<tenantid>/oauth2/token"
dbutils.fs.mount( source = "abfss://<container name>@<storageaccount>.dfs.core.windows.net/", mount_point = "/mnt/datalake", extra_configs = configs)
2. Read/write
from/to /mnt/datalake/... like any file system.
4. Write
Python code to check if a string is a palindrome.
Answer:
def
is_palindrome(s):
s =
s.lower().replace(" ", "")
return s ==
s[::-1]
print(is_palindrome("Madam"))
#
True
5. How do
you implement data governance in a data lake environment?
Answer:
Key components
of Data Governance in a data lake
• Data Cataloging: Use Azure Purview or Microsoft
Purview for metadata management.
• Access Control: Apply RBAC + ACLs on ADLS Gen2.
Data Classification: Tag sensitive data for security.
• Data
Lineage: Track data transformations using Purview or Databricks lineage.
• Auditing:
Enable logging for all access and changes.
• Schema
Enforcement: Use Delta Lake to maintain schema.
6. Explain
the differences between Azure SQL Database and Azure SQL Managed Instance.
Answer:
Feature
Azure SQL Database Azure SQL Managed Instance Managed by Azure Yes Yes
SQL Server
compatibility Partial Full (including SQL Agent, SSIS)
VNET support
Limited Full VNET integration
Use cases
Cloud-first, modern apps Lift-and-shift legacy apps Managed Instance is
closer to full SQL Server, great for migration.
7. How do
you monitor and troubleshoot issues in Azure SQL Database?
Answer:
•
Monitoring Tools:
•Query
Performance Insight: Shows resource-consuming queries.
•SQL
Auditing: Captures DB activities.
• Azure Monitor / Log Analytics: Tracks
performance and issues.
DMVs
(Dynamic Management Views):
For real-time diagnostics.
•
Troubleshooting Tips:
Use sys.dm_exec_requests,
sys.dm_exec_query_stats o Analyze
deadlocks,
long-running queries o Set alerts on CPU/memory
utilization
8. Describe
the process of data ingestion in Azure Synapse.
Answer:
Ingestion
in Synapse includes:
• Using
COPY INTO from external sources like ADLS, Blob, or external tables.
• Synapse
Pipelines (ADF-like) to orchestrate batch loads.
• Linked
Services for connection.
• Integration with PolyBase for fast
bulk loading.
• Streaming
ingestion using Event Hubs or Kafka with Spark pools.
Ingested data lands in dedicated SQL pools, serverless SQL, or Spark tables, depending on the architecture.
💬 Comments
Comments (0)
No comments yet. Be the first to share your thoughts!