📚 Chapters
LatentView Data Engineering Interview Questions 2021
✍️ By MONU SINGH | 11/18/2025
Here is the 8 important Questions and answers asked in LatentView Data Engineering Interview 2020
1. Explain
the purpose of SparkContext and SparkSession.
Answer:
•
SparkContext:
. The entry
point to Spark Core functionality. It sets up internal services and establishes
a connection to a Spark execution environment.
. Used in
older Spark versions (<2.0).
. Example: sc =
SparkContext(appName="MyApp")
•
SparkSession:
. Introduced
in Spark 2.0 as the new entry point for all Spark functionality, including
Spark SQL and DataFrame APIs.
. Internally,
it creates a SparkContext. o Recommended over SparkContext for all modern use.
o Example: spark =
SparkSession.builder.appName("MyApp").getOrCreate()
2. How to
handle incremental load in PySpark when the table lacks a last_modified column?
If there's no last_modified or created_at column:
• Option 1:
Use checksum/hash comparison:
• Generate a
hash of relevant columns and compare with the previously stored hash snapshot.
• Option 2:
Use change tracking tables or CDC (Change Data Capture) from the
source if supported.
• Option 3:
If the table is small, do full load and deduplicate on the destination using
surrogate keys or primary keys.
• Option 4:
Use record versioning if available via audit logs.
3. How do
you use Azure Stream Analytics for real-time data processing?
Answer:
• Azure Stream
Analytics (ASA) is a real-time event processing engine.
• Steps:
1. Define input:
IoT Hub, Event Hub, or Blob Storage.
2. Define query:
Use SQL-like syntax to process data (aggregations, joins, filters). 3. Define output:
Azure SQL, Power BI, ADLS, Blob, etc.
• Example
query
SELECT deviceId, AVG(temperature)
AS avgTemp
INTO outputAlias
FROM inputAlias
GROUP BY deviceId, TumblingWindow(minute, 1)
4. What are
the security features available in ADLS (e.g., access control lists, role-based
access)?
Answer:
Security features in Azure Data Lake Storage
Gen2:
• Role-Based
Access Control (RBAC):
•Assigns
permissions at subscription/resource/container level via Azure AD roles.
• Access
Control Lists (ACLs):
• Fine-grained
permissions at folder and file level.
• Supports
POSIX-style permissioning.
• Network
Security: o Virtual Network (VNet) service endpoints, firewalls, and
private endpoints.
• Encryption:
• Data is
encrypted at rest with Microsoft-managed or customer-managed keys (CMK).
• Supports HTTPS for encryption in transit.
5. Write a
SQL query to remove duplicate rows from a table.
Answer:
DELETE FROM
my_table
WHERE id NOT
IN (
SELECT MIN(id)
FROM my_table
GROUP BY
column1, column2, column3
This assumes id is a unique identifier. Adjust columns based on what defines a duplicate.
6. How do
you manage data lifecycle policies in ADLS?
Answer:
Use Azure Blob
Lifecycle Management policies:
• Define rules
for automatic tiering or deletion based on blob age or last modified date. Examples:
• Move blobs
to cool tier after 30 days.
• Delete blobs
older than 180 days.
Steps:
1. Go to
storage account > Data Management > Lifecycle Management.
2. Add rule:
Choose conditions and actions.
3. Apply to selected containers
or blob prefixes.
7. What are
the key considerations for designing a scalable data architecture in Azure?
Answer:
• Modular
design: Decouple ingestion, processing, and serving layers.
• Use of
scalable services: Databricks, Synapse, ADF, ADLS Gen2.
• Partitioning:
Effective partitioning for parallelism.
• Security
& governance: Use Purview, Key Vault, RBAC/ACLs.
• Monitoring:
Implement telemetry/logging with Azure Monitor, Log Analytics.
• Automation:
Use CI/CD and infrastructure as code (ARM/Bicep/Terraform).
• Cost
optimization: Use proper storage tiers, autoscaling, and data lifecycle
rules.
8. How do
you integrate Azure Key Vault with other Azure services?
Answer:
• Azure Key
Vault is used to store secrets, keys, and certificates securely.
Integration
Examples:
• ADF:
Create a linked service and reference secrets using @Microsoft.KeyVault()
syntax.
• Databricks: Use secret scopes backed by Key Vault
(dbutils.secrets.get().
• App Services / Functions: Reference secrets
in environment variables using
@Microsoft.KeyVault
in configuration.
• Synapse:
Secure credentials for Linked Services.
Steps:
1. Assign Managed
Identity to the service.
2. Grant access
policy in Key Vault for the identity.
3. Reference
the secret in service configuration.
💬 Comments
Comments (0)
No comments yet. Be the first to share your thoughts!