Loading ...

📚 Chapters

AMAZON Data Analyst interview Questions (2025)

✍️ By ANUJ SINGH | 11/14/2025

 

 Data Analyst interview Questions (2025)

 

SQL Questions (1–7)

 

1.  Find customers who purchased more than 3 times in the last month.

 

SELECT customer_id, COUNT(*) AS purchase_count

FROM orders

WHERE order_date >= DATEADD(month, -1, GETDATE())

GROUP BY customer_id

HAVING COUNT(*) > 3;

 

2.   Write a query to find the second highest salary.

 

SELECT MAX(salary)

FROM employees

WHERE salary < (SELECT MAX(salary) FROM employees);

 

3. What is the difference between RANK(), DENSE_RANK(), and ROW_NUMBER()?

 

                RANK(): Skips numbers after ties.

                DENSE_RANK(): No gaps in ranking.

                ROW_NUMBER(): Unique sequential number regardless of ties.

 

4. Find duplicate records in a table.

 

SELECT customer_id, COUNT(*)

FROM customers

GROUP BY customer_id

HAVING COUNT(*) > 1;

 

5. What’s the difference between WHERE and HAVING?

                WHERE: Filters before aggregation.

                HAVING: Filters after aggregation.

 

 

6. Get the average order value for each customer.

SELECT customer_id, AVG(order_amount) AS avg_value

FROM orders

GROUP BY customer_id;

 

7. How do you optimize a slow SQL query?

 

                Create indexes

                Avoid SELECT *

                Use EXPLAIN PLAN

                Limit subqueries

                Partition large tables

 

 

Python & Pandas Questions (8–12)

 

 

8. Drop missing values from a DataFrame.

df.dropna(inplace=True)


9. Group by and calculate total sales by region.

df.groupby("Region")["Sales"].sum()


10. Find outliers using the IQR method.

Q1 = df['amount'].quantile(0.25)

Q3 = df['amount'].quantile(0.75)

IQR = Q3 - Q1

outliers = df[(df['amount'] < Q1 - 1.5*IQR) | (df['amount'] > Q3 + 1.5*IQR)]

 

 

11. Merge two DataFrames.

pd.merge(df1, df2, on='customer_id', how='inner')

 

 

12. How do you handle large datasets in Python?

                Use dask or modin for parallel processing

                Load data in chunks with read_csv(chunksize=10000)

                Optimize data types (e.g., convert object to category)

 

 

Excel/Power BI Questions (13–15)

 

13. What Excel functions do you use in analysis?

                VLOOKUP, INDEX-MATCH

                IF, IFS, SUMIFS, COUNTIFS

                Pivot Tables, Charts, Slicers

 

 

14. Difference between Calculated Column and Measure in Power BI?

                Column: Calculated row-by-row and stored.

                Measure: Calculated at query time (more efficient for aggregations).

 


15. What are slicers and filters in Power BI?

                Slicers: Visual tools for filtering.

                Filters: Apply filtering at report, page, or visual level.

 

Business Case & Product Questions (16–19)

 

16. What metrics would you track for Amazon delivery performance?

                On-Time Delivery Rate

                Average Delivery Time

                Return Rate

                Customer Satisfaction Score

 

 

17. Design a dashboard to monitor sales performance.

Metrics:

                Total Sales, Profit

                Orders by Region/Category

                Top-Selling Products

                Filters: Time, Region, Category

 

 

18. How would you reduce cart abandonment on Amazon?

                Analyze drop-off steps in checkout funnel

                A/B test different UX changes

                Use ML model to predict high-risk customers

 

 

19. How would you evaluate if a new feature increased sales?

                Use A/B Testing

 

 

                Pre/post analysis of KPIs

                Control for seasonality and external factors

 

A/B Testing Questions (20–21)

 

20. Explain p-value in A/B testing.

                Probability of seeing the observed difference (or more extreme) under the null hypothesis.

                A low p-value (e.g. < 0.05) suggests the difference is statistically significant.

 

 

21. How would you calculate statistical significance in Python?

from scipy.stats import ttest_ind

t_stat, p_val = ttest_ind(group_A, group_B)

 

Behavioral (Leadership Principles) (22–25)

 

22. Tell me about a time you used data to solve a business problem.

In my previous project, I used Power BI to identify why return rates were high in one region. After root-cause analysis, we changed the vendor, reducing returns by 30%.

 

23. Describe a time when you had to dive deep.

I noticed a discrepancy in weekly revenue numbers. I traced it to a duplicate data load and wrote a validation script to catch it before dashboard refresh.

 

24. Tell me about a time you took ownership.

When a data pipeline broke, even though I wasn’t the owner, I debugged it and restored the process to avoid dashboard downtime.


25. Have you ever disagreed with your manager about a decision?

Yes, we disagreed on the metric to use for campaign success. I backed my point with data and showed how conversion rate was a more accurate metric than just clicks. 

💬 Comments

logo

Comments (0)

No comments yet. Be the first to share your thoughts!