November 22, 2025
12 min read

Junior Data Analyst Interview Questions: SQL, Excel & BI

interview
career-advice
job-search
entry-level
Junior Data Analyst Interview Questions: SQL, Excel & BI
Milad Bonakdar

Milad Bonakdar

Author

Practice junior data analyst interview questions on SQL, Excel, dashboards, statistics, KPIs, and explaining insights clearly.


Junior Data Analyst Interview Questions: What to Prepare

A junior data analyst interview usually tests whether you can clean messy data, write basic SQL, summarize data in Excel, build a clear dashboard, and explain business impact without overcomplicating the answer. Prepare short examples that show how you found an issue, checked the data, chose the right metric, and explained the result.

This guide focuses on the practical questions most likely to come up for entry-level analyst roles: Excel lookup and pivot work, SQL filtering and joins, Tableau or Power BI dashboards, basic statistics, KPIs, and stakeholder communication. Use each answer as a base, then add one real project or coursework example.

How to use this guide

Pick one dataset project before the interview and connect your answers to it. Interviewers want to hear not only that you know a function or query, but also when you would use it, how you would check the output, and how you would explain the finding to a non-technical team.


Excel Fundamentals (5 Questions)

1. What are the most important Excel functions for data analysis?

Answer: For junior data analyst interviews, focus on functions that help you clean, join, summarize, and check data. Know what each one is for, not just the syntax.

  • XLOOKUP: Modern lookup across a table or range; it can return values from either side of the lookup column.
  • VLOOKUP: Still useful for legacy spreadsheets and older workflows.
  • INDEX/MATCH: Flexible lookup pattern, especially when you need more control.
  • IF/IFS and AND/OR: Business rules and conditional logic.
  • SUMIFS/COUNTIFS/AVERAGEIFS: Conditional summaries by segment, date, region, or status.
  • Pivot Tables: Fast summaries for categories, trends, and checks.
  • TEXT and DATE functions: Clean inconsistent fields and build reporting periods.
# XLOOKUP example: find product price by SKU
=XLOOKUP(A2, Products!A:A, Products!C:C, "Not found")

# SUMIFS example: sum sales where region="East" and product="Widget"
=SUMIFS(Sales!C:C, Sales!A:A, "East", Sales!B:B, "Widget")

# INDEX/MATCH example
=INDEX(Prices!C:C, MATCH(A2, Prices!A:A, 0))

# Conditional business label
=IF(B2>1000, "High", IF(B2>500, "Medium", "Low"))

In an interview, add a practical sentence: “I would use XLOOKUP to enrich a sales table with product categories, then use a pivot table to check revenue by category before building the dashboard.”

Rarity: Very Common Difficulty: Easy


2. Explain the difference between XLOOKUP, VLOOKUP, and INDEX/MATCH.

Answer: These are lookup methods. The best answer explains when each one is useful.

  • XLOOKUP: Easiest modern option when available. It can search one range and return a value from another range, including columns to the left or right.
  • VLOOKUP: Common in older spreadsheets. It is simple, but it normally looks from the first column of a range to columns on the right and can break when column positions change.
  • INDEX/MATCH: More flexible than VLOOKUP and still useful when you need a lookup pattern that is explicit and portable across older files.
# XLOOKUP - lookup SKU in column A, return price from column C
=XLOOKUP(A2, A:A, C:C, "Not found")

# VLOOKUP - lookup value in first column of A:C, return column 3
=VLOOKUP(A2, A:C, 3, FALSE)

# INDEX/MATCH - equivalent pattern
=INDEX(C:C, MATCH(A2, A:A, 0))

A strong junior answer also mentions validation: after a lookup, check unmatched rows, duplicate keys, and whether the lookup key has consistent formatting.

Rarity: Very Common Difficulty: Easy


3. How do you create and use Pivot Tables?

Answer: Pivot Tables summarize large datasets quickly.

  • Steps:
    1. Select data range
    2. Insert → Pivot Table
    3. Drag fields to Rows, Columns, Values
    4. Apply filters and formatting
  • Use Cases: Summarize sales by region, analyze trends, create reports
# Pivot Table structure:
Rows: Product Category
Columns: Quarter
Values: Sum of Sales
Filters: Region

# Calculated fields in Pivot Tables
Profit Margin = (Revenue - Cost) / Revenue

# Grouping dates
Right-click date → Group → Select Months/Quarters/Years

Rarity: Very Common Difficulty: Easy


4. What is conditional formatting and when would you use it?

Answer: Conditional formatting applies visual formatting based on cell values.

  • Use Cases:
    • Highlight top/bottom values
    • Show data bars or color scales
    • Identify duplicates
    • Flag outliers
    • Create heat maps
# Highlight cells greater than 1000
Select range → Conditional Formatting → Highlight Cells Rules → Greater Than

# Color scale (gradient)
Select range → Conditional Formatting → Color Scales

# Data bars
Select range → Conditional Formatting → Data Bars

# Custom formula
=AND($B2>1000, $C2="Active")

Rarity: Common Difficulty: Easy


5. How do you remove duplicates and handle missing data in Excel?

Answer: Data cleaning is essential for accurate analysis.

# Remove duplicates
Data tab → Remove Duplicates → Select columns

# Find duplicates with conditional formatting
Select range → Conditional Formatting → Highlight Cells Rules → Duplicate Values

# Handle missing data
# Option 1: Filter and delete
Filter column → Uncheck (Blanks) → Delete visible rows

# Option 2: Fill with average
=IF(ISBLANK(A2), AVERAGE(A:A), A2)

# Option 3: Fill down
Select range → Ctrl+D (Windows) or Cmd+D (Mac)

# Find and replace blanks
Ctrl+H → Find what: (leave blank) → Replace with: 0 or N/A

Rarity: Very Common Difficulty: Easy


SQL Basics (5 Questions)

6. Write a SQL query to select all columns from a table.

Answer: Basic SELECT statement retrieves data from tables.

-- Select all columns
SELECT * FROM employees;

-- Select specific columns
SELECT first_name, last_name, salary
FROM employees;

-- Select with alias
SELECT 
    first_name AS "First Name",
    last_name AS "Last Name",
    salary AS "Annual Salary"
FROM employees;

-- Select distinct values
SELECT DISTINCT department
FROM employees;

-- Limit results
SELECT * FROM employees
LIMIT 10;

Rarity: Very Common Difficulty: Easy


7. How do you filter data using WHERE clause?

Answer: WHERE clause filters rows based on conditions.

-- Simple condition
SELECT * FROM employees
WHERE salary > 50000;

-- Multiple conditions with AND
SELECT * FROM employees
WHERE department = 'Sales' AND salary > 60000;

-- Multiple conditions with OR
SELECT * FROM employees
WHERE department = 'Sales' OR department = 'Marketing';

-- IN operator
SELECT * FROM employees
WHERE department IN ('Sales', 'Marketing', 'IT');

-- BETWEEN operator
SELECT * FROM employees
WHERE salary BETWEEN 50000 AND 80000;

-- LIKE operator (pattern matching)
SELECT * FROM employees
WHERE first_name LIKE 'J%';  -- Starts with J

-- IS NULL
SELECT * FROM employees
WHERE manager_id IS NULL;

-- NOT operator
SELECT * FROM employees
WHERE department NOT IN ('HR', 'Finance');

Rarity: Very Common Difficulty: Easy


8. Explain JOIN operations and their types.

Answer: JOINs combine data from multiple tables.

  • INNER JOIN: Returns matching rows from both tables
  • LEFT JOIN: Returns all rows from left table, matching from right
  • RIGHT JOIN: Returns all rows from right table, matching from left
  • FULL OUTER JOIN: Returns all rows from both tables
-- INNER JOIN
SELECT e.first_name, e.last_name, d.department_name
FROM employees e
INNER JOIN departments d ON e.department_id = d.department_id;

-- LEFT JOIN (all employees, even without department)
SELECT e.first_name, e.last_name, d.department_name
FROM employees e
LEFT JOIN departments d ON e.department_id = d.department_id;

-- Multiple joins
SELECT 
    e.first_name,
    d.department_name,
    l.city
FROM employees e
INNER JOIN departments d ON e.department_id = d.department_id
INNER JOIN locations l ON d.location_id = l.location_id;

-- Self join (employees and their managers)
SELECT 
    e.first_name AS employee,
    m.first_name AS manager
FROM employees e
LEFT JOIN employees m ON e.manager_id = m.employee_id;

Rarity: Very Common Difficulty: Medium


9. How do you use GROUP BY and aggregate functions?

Answer: GROUP BY groups rows and aggregate functions summarize data.

-- Count employees by department
SELECT department, COUNT(*) AS employee_count
FROM employees
GROUP BY department;

-- Average salary by department
SELECT 
    department,
    AVG(salary) AS avg_salary,
    MIN(salary) AS min_salary,
    MAX(salary) AS max_salary
FROM employees
GROUP BY department;

-- HAVING clause (filter groups)
SELECT department, AVG(salary) AS avg_salary
FROM employees
GROUP BY department
HAVING AVG(salary) > 60000;

-- Multiple grouping columns
SELECT 
    department,
    job_title,
    COUNT(*) AS count,
    AVG(salary) AS avg_salary
FROM employees
GROUP BY department, job_title
ORDER BY department, avg_salary DESC;

-- Common aggregate functions
-- COUNT, SUM, AVG, MIN, MAX, COUNT(DISTINCT)

Rarity: Very Common Difficulty: Medium


10. What is the difference between WHERE and HAVING?

Answer:

  • WHERE: Filters rows before grouping
  • HAVING: Filters groups after grouping
  • WHERE: Cannot use aggregate functions
  • HAVING: Can use aggregate functions
-- WHERE - filter before grouping
SELECT department, AVG(salary) AS avg_salary
FROM employees
WHERE salary > 40000  -- Filter individual rows
GROUP BY department;

-- HAVING - filter after grouping
SELECT department, AVG(salary) AS avg_salary
FROM employees
GROUP BY department
HAVING AVG(salary) > 60000;  -- Filter groups

-- Both together
SELECT department, AVG(salary) AS avg_salary
FROM employees
WHERE hire_date > '2020-01-01'  -- Filter rows first
GROUP BY department
HAVING COUNT(*) > 5;  -- Then filter groups

-- This would be WRONG:
-- SELECT department FROM employees
-- WHERE COUNT(*) > 5;  -- Error: can't use aggregate in WHERE

Rarity: Very Common Difficulty: Easy


Data Visualization (4 Questions)

11. What are the key principles of effective data visualization?

Answer: Good visualizations communicate insights clearly.

  • Principles:
    • Choose the right chart type (bar for comparison, line for trends, pie for parts of whole)
    • Keep it simple (avoid clutter)
    • Use appropriate colors (consistent, accessible)
    • Label clearly (titles, axes, legends)
    • Tell a story (highlight key insights)
    • Consider audience (technical vs non-technical)
Loading diagram...

Rarity: Common Difficulty: Easy


12. When would you use a bar chart vs a line chart?

Answer: Different chart types serve different purposes:

  • Bar Chart:
    • Compare categories
    • Discrete data
    • Examples: Sales by region, product comparison
  • Line Chart:
    • Show trends over time
    • Continuous data
    • Examples: Monthly revenue, stock prices
  • Other Charts:
    • Pie Chart: Parts of a whole (use sparingly)
    • Scatter Plot: Relationship between two variables
    • Histogram: Distribution of continuous data

Rarity: Common Difficulty: Easy


13. What is Tableau and what are its key features?

Answer: Tableau is a data visualization and business intelligence tool used to explore data and share interactive dashboards. A junior analyst does not need to know every advanced feature, but should understand the core workflow.

  • Connect to data: Excel files, databases, cloud sources, or published data sources.
  • Build views: Drag dimensions and measures into charts, tables, maps, and other visuals.
  • Create dashboards: Combine multiple worksheets, filters, and interactive controls into one view.
  • Use calculated fields: Create new metrics when the source data does not already contain the needed field.
  • Use parameters and filters: Let viewers change what they see without editing the workbook.
  • Share results: Publish dashboards and explain the insight, not just the chart.

A good interview answer connects the tool to a business question: “I would use Tableau to show monthly churn by customer segment, add filters for region and plan type, and validate the dashboard totals against the source query before sharing it.”

Rarity: Very Common Difficulty: Easy


14. What is the difference between Tableau and Power BI?

Answer: Both tools help analysts build reports and dashboards, but interviewers want to hear how you choose based on the team and data environment.

  • Tableau: Strong for flexible visual exploration, dashboard building, calculated fields, parameters, and sharing interactive views. It is often a good fit when the main need is visual analysis across mixed data sources.
  • Power BI: Strong in Microsoft environments, especially when teams already use Excel, Fabric, Microsoft 365, Power Query, semantic models, and DAX. It is often a good fit for governed reporting and recurring business dashboards.
  • How to answer: Do not claim one is always better. Explain the trade-off: audience, existing stack, data model complexity, refresh needs, permissions, and who will maintain the dashboard.

A strong junior answer: “I can use either tool for basic dashboards. I would choose Power BI if the company is already centered on Microsoft data workflows, and Tableau when the team needs more flexible visual exploration. In both cases, I would start with the business question, clean the data, define the metric, and validate totals before publishing.”

Statistics & Analysis (4 Questions)

15. What measures of central tendency do you know?

Answer: Measures of central tendency describe the center of a dataset:

  • Mean: Average (sum / count)
    • Sensitive to outliers
    • Use for normally distributed data
  • Median: Middle value when sorted
    • Robust to outliers
    • Use for skewed data
  • Mode: Most frequent value
    • Use for categorical data
import numpy as np

data = [1, 2, 2, 3, 4, 5, 100]

mean = np.mean(data)  # 16.71 (affected by outlier 100)
median = np.median(data)  # 3 (not affected by outlier)
# mode = 2 (most frequent)

print(f"Mean: {mean}")
print(f"Median: {median}")

Rarity: Very Common Difficulty: Easy


16. How do you identify outliers in a dataset?

Answer: Outliers are data points that differ significantly from other observations.

  • Methods:
    • Visual: Box plots, scatter plots
    • Statistical:
      • IQR method (1.5 × IQR beyond Q1/Q3)
      • Z-score (|z| > 3)
      • Standard deviation (beyond 2-3 std devs)
import numpy as np

data = np.array([10, 12, 13, 12, 11, 14, 13, 15, 100, 12])

# IQR method
Q1 = np.percentile(data, 25)
Q3 = np.percentile(data, 75)
IQR = Q3 - Q1

lower_bound = Q1 - 1.5 * IQR
upper_bound = Q3 + 1.5 * IQR

outliers = data[(data < lower_bound) | (data > upper_bound)]
print(f"Outliers: {outliers}")  # [100]

# Z-score method
mean = np.mean(data)
std = np.std(data)
z_scores = np.abs((data - mean) / std)
outliers_z = data[z_scores > 3]
print(f"Outliers (Z-score): {outliers_z}")

Rarity: Common Difficulty: Medium


17. What is the difference between correlation and causation?

Answer:

  • Correlation: Statistical relationship between variables
    • Measured by correlation coefficient (-1 to 1)
    • Doesn't imply causation
  • Causation: One variable directly causes changes in another
    • Requires controlled experiments
    • Correlation is necessary but not sufficient

Examples:

  • Ice cream sales and drowning deaths are correlated (both increase in summer)
  • But ice cream doesn't cause drowning (confounding variable: temperature)
-- Calculate correlation in SQL (simplified)
SELECT 
    CORR(sales, temperature) AS correlation
FROM daily_data;

-- Positive correlation: both increase together
-- Negative correlation: one increases, other decreases
-- Zero correlation: no relationship

Rarity: Very Common Difficulty: Easy


18. How do you calculate percentage change?

Answer: Percentage change measures relative change between two values.

# Formula
Percentage Change = ((New Value - Old Value) / Old Value) × 100

# Excel formula
=(B2-A2)/A2*100

# Example:
Old Value: 100
New Value: 120
Change: (120-100)/100 = 0.20 = 20% increase

# Year-over-Year growth
=(Sales_2023 - Sales_2022) / Sales_2022 * 100
-- SQL percentage change
SELECT 
    year,
    revenue,
    LAG(revenue) OVER (ORDER BY year) AS prev_year_revenue,
    ((revenue - LAG(revenue) OVER (ORDER BY year)) / 
     LAG(revenue) OVER (ORDER BY year) * 100) AS pct_change
FROM annual_sales;

Rarity: Very Common Difficulty: Easy


Business Intelligence & Reporting (2 Questions)

19. What is a KPI and how do you choose the right ones?

Answer: KPI (Key Performance Indicator) is a measurable value that shows how effectively objectives are being achieved.

  • Characteristics of good KPIs:
    • Specific: Clear and well-defined
    • Measurable: Quantifiable
    • Achievable: Realistic
    • Relevant: Aligned with business goals
    • Time-bound: Has a timeframe
  • Examples:
    • Sales: Monthly revenue, conversion rate
    • Marketing: Customer acquisition cost, ROI
    • Operations: Order fulfillment time, error rate
    • Customer: Satisfaction score, retention rate

Rarity: Common Difficulty: Easy


20. How do you present data insights to non-technical stakeholders?

Answer: Effective communication is crucial for data analysts.

  • Best Practices:
    • Start with the conclusion (what they need to know)
    • Use simple visualizations (avoid complex charts)
    • Tell a story (context, insight, recommendation)
    • Avoid jargon (explain technical terms)
    • Focus on business impact (revenue, cost, efficiency)
    • Provide actionable recommendations
    • Be prepared for questions
  • Structure:
    1. Executive summary
    2. Key findings
    3. Supporting data/visualizations
    4. Recommendations
    5. Next steps

Rarity: Common Difficulty: Medium


Newsletter subscription

Weekly career tips that actually work

Get the latest insights delivered straight to your inbox

Your Next Interview is Just One Resume Away

Create a professional, optimized resume in minutes. No design skills needed—just proven results.

Create my resume

Share this post

Cut Your Resume Writing Time by 90%

The average job seeker spends 3+ hours formatting a resume. Our AI does it in under 15 minutes, getting you to the application phase 12x faster.