Junior Data Analyst Interview Questions: Complete Guide

Milad Bonakdar
Author
Master data analysis fundamentals with essential interview questions covering Excel, SQL, data visualization, statistics basics, and business intelligence tools for junior data analysts.
Introduction
Data analysts transform raw data into actionable insights that drive business decisions. Junior data analysts are expected to have strong skills in Excel, SQL, data visualization tools, and basic statistics to support data-driven decision making.
This guide covers essential interview questions for Junior Data Analysts. We explore Excel functions, SQL queries, data visualization with Tableau and Power BI, statistics fundamentals, and data analysis best practices to help you prepare for your first data analyst role.
Excel Fundamentals (5 Questions)
1. What are the most important Excel functions for data analysis?
Answer: Essential Excel functions every data analyst should know:
- VLOOKUP/XLOOKUP: Lookup values in tables
- IF/IFS: Conditional logic
- SUMIF/SUMIFS: Conditional summation
- COUNTIF/COUNTIFS: Conditional counting
- PIVOT TABLES: Summarize and analyze data
- INDEX/MATCH: More flexible than VLOOKUP
- TEXT functions: LEFT, RIGHT, MID, CONCATENATE
- DATE functions: TODAY, DATEDIF, EOMONTH
# VLOOKUP example
=VLOOKUP(A2, Products!A:C, 3, FALSE)
# SUMIFS example (sum sales where region="East" and product="Widget")
=SUMIFS(Sales!C:C, Sales!A:A, "East", Sales!B:B, "Widget")
# INDEX/MATCH example (more flexible than VLOOKUP)
=INDEX(Prices!C:C, MATCH(A2, Prices!A:A, 0))
# Conditional formatting with IF
=IF(B2>1000, "High", IF(B2>500, "Medium", "Low"))Rarity: Very Common Difficulty: Easy
2. Explain the difference between VLOOKUP and INDEX/MATCH.
Answer:
- VLOOKUP:
- Simpler syntax
- Only looks to the right
- Less flexible
- Slower for large datasets
- INDEX/MATCH:
- More complex syntax
- Can look left or right
- More flexible
- Faster performance
- Can return entire rows/columns
# VLOOKUP - lookup value in column A, return from column C
=VLOOKUP(A2, A:C, 3, FALSE)
# INDEX/MATCH - equivalent but more flexible
=INDEX(C:C, MATCH(A2, A:A, 0))
# INDEX/MATCH advantage: can look left
=INDEX(A:A, MATCH(C2, C:C, 0)) # VLOOKUP can't do thisRarity: Very Common Difficulty: Easy
3. How do you create and use Pivot Tables?
Answer: Pivot Tables summarize large datasets quickly.
- Steps:
- Select data range
- Insert → Pivot Table
- Drag fields to Rows, Columns, Values
- Apply filters and formatting
- Use Cases: Summarize sales by region, analyze trends, create reports
# Pivot Table structure:
Rows: Product Category
Columns: Quarter
Values: Sum of Sales
Filters: Region
# Calculated fields in Pivot Tables
Profit Margin = (Revenue - Cost) / Revenue
# Grouping dates
Right-click date → Group → Select Months/Quarters/YearsRarity: Very Common Difficulty: Easy
4. What is conditional formatting and when would you use it?
Answer: Conditional formatting applies visual formatting based on cell values.
- Use Cases:
- Highlight top/bottom values
- Show data bars or color scales
- Identify duplicates
- Flag outliers
- Create heat maps
# Highlight cells greater than 1000
Select range → Conditional Formatting → Highlight Cells Rules → Greater Than
# Color scale (gradient)
Select range → Conditional Formatting → Color Scales
# Data bars
Select range → Conditional Formatting → Data Bars
# Custom formula
=AND($B2>1000, $C2="Active")Rarity: Common Difficulty: Easy
5. How do you remove duplicates and handle missing data in Excel?
Answer: Data cleaning is essential for accurate analysis.
# Remove duplicates
Data tab → Remove Duplicates → Select columns
# Find duplicates with conditional formatting
Select range → Conditional Formatting → Highlight Cells Rules → Duplicate Values
# Handle missing data
# Option 1: Filter and delete
Filter column → Uncheck (Blanks) → Delete visible rows
# Option 2: Fill with average
=IF(ISBLANK(A2), AVERAGE(A:A), A2)
# Option 3: Fill down
Select range → Ctrl+D (Windows) or Cmd+D (Mac)
# Find and replace blanks
Ctrl+H → Find what: (leave blank) → Replace with: 0 or N/ARarity: Very Common Difficulty: Easy
SQL Basics (5 Questions)
6. Write a SQL query to select all columns from a table.
Answer: Basic SELECT statement retrieves data from tables.
-- Select all columns
SELECT * FROM employees;
-- Select specific columns
SELECT first_name, last_name, salary
FROM employees;
-- Select with alias
SELECT
first_name AS "First Name",
last_name AS "Last Name",
salary AS "Annual Salary"
FROM employees;
-- Select distinct values
SELECT DISTINCT department
FROM employees;
-- Limit results
SELECT * FROM employees
LIMIT 10;Rarity: Very Common Difficulty: Easy
7. How do you filter data using WHERE clause?
Answer: WHERE clause filters rows based on conditions.
-- Simple condition
SELECT * FROM employees
WHERE salary > 50000;
-- Multiple conditions with AND
SELECT * FROM employees
WHERE department = 'Sales' AND salary > 60000;
-- Multiple conditions with OR
SELECT * FROM employees
WHERE department = 'Sales' OR department = 'Marketing';
-- IN operator
SELECT * FROM employees
WHERE department IN ('Sales', 'Marketing', 'IT');
-- BETWEEN operator
SELECT * FROM employees
WHERE salary BETWEEN 50000 AND 80000;
-- LIKE operator (pattern matching)
SELECT * FROM employees
WHERE first_name LIKE 'J%'; -- Starts with J
-- IS NULL
SELECT * FROM employees
WHERE manager_id IS NULL;
-- NOT operator
SELECT * FROM employees
WHERE department NOT IN ('HR', 'Finance');Rarity: Very Common Difficulty: Easy
8. Explain JOIN operations and their types.
Answer: JOINs combine data from multiple tables.
- INNER JOIN: Returns matching rows from both tables
- LEFT JOIN: Returns all rows from left table, matching from right
- RIGHT JOIN: Returns all rows from right table, matching from left
- FULL OUTER JOIN: Returns all rows from both tables
-- INNER JOIN
SELECT e.first_name, e.last_name, d.department_name
FROM employees e
INNER JOIN departments d ON e.department_id = d.department_id;
-- LEFT JOIN (all employees, even without department)
SELECT e.first_name, e.last_name, d.department_name
FROM employees e
LEFT JOIN departments d ON e.department_id = d.department_id;
-- Multiple joins
SELECT
e.first_name,
d.department_name,
l.city
FROM employees e
INNER JOIN departments d ON e.department_id = d.department_id
INNER JOIN locations l ON d.location_id = l.location_id;
-- Self join (employees and their managers)
SELECT
e.first_name AS employee,
m.first_name AS manager
FROM employees e
LEFT JOIN employees m ON e.manager_id = m.employee_id;Rarity: Very Common Difficulty: Medium
9. How do you use GROUP BY and aggregate functions?
Answer: GROUP BY groups rows and aggregate functions summarize data.
-- Count employees by department
SELECT department, COUNT(*) AS employee_count
FROM employees
GROUP BY department;
-- Average salary by department
SELECT
department,
AVG(salary) AS avg_salary,
MIN(salary) AS min_salary,
MAX(salary) AS max_salary
FROM employees
GROUP BY department;
-- HAVING clause (filter groups)
SELECT department, AVG(salary) AS avg_salary
FROM employees
GROUP BY department
HAVING AVG(salary) > 60000;
-- Multiple grouping columns
SELECT
department,
job_title,
COUNT(*) AS count,
AVG(salary) AS avg_salary
FROM employees
GROUP BY department, job_title
ORDER BY department, avg_salary DESC;
-- Common aggregate functions
-- COUNT, SUM, AVG, MIN, MAX, COUNT(DISTINCT)Rarity: Very Common Difficulty: Medium
10. What is the difference between WHERE and HAVING?
Answer:
- WHERE: Filters rows before grouping
- HAVING: Filters groups after grouping
- WHERE: Cannot use aggregate functions
- HAVING: Can use aggregate functions
-- WHERE - filter before grouping
SELECT department, AVG(salary) AS avg_salary
FROM employees
WHERE salary > 40000 -- Filter individual rows
GROUP BY department;
-- HAVING - filter after grouping
SELECT department, AVG(salary) AS avg_salary
FROM employees
GROUP BY department
HAVING AVG(salary) > 60000; -- Filter groups
-- Both together
SELECT department, AVG(salary) AS avg_salary
FROM employees
WHERE hire_date > '2020-01-01' -- Filter rows first
GROUP BY department
HAVING COUNT(*) > 5; -- Then filter groups
-- This would be WRONG:
-- SELECT department FROM employees
-- WHERE COUNT(*) > 5; -- Error: can't use aggregate in WHERERarity: Very Common Difficulty: Easy
Data Visualization (4 Questions)
11. What are the key principles of effective data visualization?
Answer: Good visualizations communicate insights clearly.
- Principles:
- Choose the right chart type (bar for comparison, line for trends, pie for parts of whole)
- Keep it simple (avoid clutter)
- Use appropriate colors (consistent, accessible)
- Label clearly (titles, axes, legends)
- Tell a story (highlight key insights)
- Consider audience (technical vs non-technical)
Rarity: Common Difficulty: Easy
12. When would you use a bar chart vs a line chart?
Answer: Different chart types serve different purposes:
- Bar Chart:
- Compare categories
- Discrete data
- Examples: Sales by region, product comparison
- Line Chart:
- Show trends over time
- Continuous data
- Examples: Monthly revenue, stock prices
- Other Charts:
- Pie Chart: Parts of a whole (use sparingly)
- Scatter Plot: Relationship between two variables
- Histogram: Distribution of continuous data
Rarity: Common Difficulty: Easy
13. What is Tableau and what are its key features?
Answer: Tableau is a leading data visualization and business intelligence tool.
- Key Features:
- Drag-and-drop interface (no coding required)
- Connect to multiple data sources (databases, Excel, cloud)
- Interactive dashboards
- Real-time data updates
- Calculated fields and parameters
- Sharing and collaboration
- Common Tasks:
- Create worksheets (individual visualizations)
- Build dashboards (multiple visualizations)
- Apply filters and parameters
- Create calculated fields
- Publish to Tableau Server/Online
Rarity: Very Common Difficulty: Easy
14. What is the difference between Tableau and Power BI?
Answer: Both are popular BI tools with different strengths:
- Tableau:
- Better for complex visualizations
- More intuitive for beginners
- Stronger data visualization capabilities
- More expensive
- Power BI:
- Better Microsoft integration
- More affordable
- Strong for Excel users
- Better for data modeling
- DAX for calculations
- Choice depends on:
- Budget
- Existing tech stack
- Complexity of visualizations
- Team skills
Rarity: Common Difficulty: Easy
Statistics & Analysis (4 Questions)
15. What measures of central tendency do you know?
Answer: Measures of central tendency describe the center of a dataset:
- Mean: Average (sum / count)
- Sensitive to outliers
- Use for normally distributed data
- Median: Middle value when sorted
- Robust to outliers
- Use for skewed data
- Mode: Most frequent value
- Use for categorical data
import numpy as np
data = [1, 2, 2, 3, 4, 5, 100]
mean = np.mean(data) # 16.71 (affected by outlier 100)
median = np.median(data) # 3 (not affected by outlier)
# mode = 2 (most frequent)
print(f"Mean: {mean}")
print(f"Median: {median}")Rarity: Very Common Difficulty: Easy
16. How do you identify outliers in a dataset?
Answer: Outliers are data points that differ significantly from other observations.
- Methods:
- Visual: Box plots, scatter plots
- Statistical:
- IQR method (1.5 × IQR beyond Q1/Q3)
- Z-score (|z| > 3)
- Standard deviation (beyond 2-3 std devs)
import numpy as np
data = np.array([10, 12, 13, 12, 11, 14, 13, 15, 100, 12])
# IQR method
Q1 = np.percentile(data, 25)
Q3 = np.percentile(data, 75)
IQR = Q3 - Q1
lower_bound = Q1 - 1.5 * IQR
upper_bound = Q3 + 1.5 * IQR
outliers = data[(data < lower_bound) | (data > upper_bound)]
print(f"Outliers: {outliers}") # [100]
# Z-score method
mean = np.mean(data)
std = np.std(data)
z_scores = np.abs((data - mean) / std)
outliers_z = data[z_scores > 3]
print(f"Outliers (Z-score): {outliers_z}")Rarity: Common Difficulty: Medium
17. What is the difference between correlation and causation?
Answer:
- Correlation: Statistical relationship between variables
- Measured by correlation coefficient (-1 to 1)
- Doesn't imply causation
- Causation: One variable directly causes changes in another
- Requires controlled experiments
- Correlation is necessary but not sufficient
Examples:
- Ice cream sales and drowning deaths are correlated (both increase in summer)
- But ice cream doesn't cause drowning (confounding variable: temperature)
-- Calculate correlation in SQL (simplified)
SELECT
CORR(sales, temperature) AS correlation
FROM daily_data;
-- Positive correlation: both increase together
-- Negative correlation: one increases, other decreases
-- Zero correlation: no relationshipRarity: Very Common Difficulty: Easy
18. How do you calculate percentage change?
Answer: Percentage change measures relative change between two values.
# Formula
Percentage Change = ((New Value - Old Value) / Old Value) × 100
# Excel formula
=(B2-A2)/A2*100
# Example:
Old Value: 100
New Value: 120
Change: (120-100)/100 = 0.20 = 20% increase
# Year-over-Year growth
=(Sales_2023 - Sales_2022) / Sales_2022 * 100-- SQL percentage change
SELECT
year,
revenue,
LAG(revenue) OVER (ORDER BY year) AS prev_year_revenue,
((revenue - LAG(revenue) OVER (ORDER BY year)) /
LAG(revenue) OVER (ORDER BY year) * 100) AS pct_change
FROM annual_sales;Rarity: Very Common Difficulty: Easy
Business Intelligence & Reporting (2 Questions)
19. What is a KPI and how do you choose the right ones?
Answer: KPI (Key Performance Indicator) is a measurable value that shows how effectively objectives are being achieved.
- Characteristics of good KPIs:
- Specific: Clear and well-defined
- Measurable: Quantifiable
- Achievable: Realistic
- Relevant: Aligned with business goals
- Time-bound: Has a timeframe
- Examples:
- Sales: Monthly revenue, conversion rate
- Marketing: Customer acquisition cost, ROI
- Operations: Order fulfillment time, error rate
- Customer: Satisfaction score, retention rate
Rarity: Common Difficulty: Easy
20. How do you present data insights to non-technical stakeholders?
Answer: Effective communication is crucial for data analysts.
- Best Practices:
- Start with the conclusion (what they need to know)
- Use simple visualizations (avoid complex charts)
- Tell a story (context, insight, recommendation)
- Avoid jargon (explain technical terms)
- Focus on business impact (revenue, cost, efficiency)
- Provide actionable recommendations
- Be prepared for questions
- Structure:
- Executive summary
- Key findings
- Supporting data/visualizations
- Recommendations
- Next steps
Rarity: Common Difficulty: Medium




