Browse Python Institute Certification Guides

Python Institute PCED Sample Questions with Explanations

Python Institute PCED sample questions with explanations, traps, topic labels, and IT Mastery route links.

These original sample questions are designed to help you check how the exam topics appear in decision-style prompts. They are not taken from the live exam.

Use these sample questions as a guided self-assessment for Certified Entry-Level Python for Data Science (PCED) topics such as Python data handling, tabular data, missing values, summary statistics, visualization, train/test split, model evaluation, and responsible data work. The prompts focus on data-shape decisions rather than memorized library names.

Where these questions fit in the PCED guide

The sample set below is part of the Python Institute PCED guide path:

PCED data-science sample questions

Work through each prompt before opening the explanation. PCED questions reward careful reasoning about rows, columns, missing data, leakage, metrics, and how a chart or statistic supports the question being asked.


Question 1

Topic: Missing values before averaging

A dataset has a delivery_minutes column. Some rows are missing the value because the delivery was canceled before dispatch. A learner wants to report the typical delivery time for completed deliveries. What should they do first?

  • A. Replace every missing value with 0 because canceled deliveries took no time.
  • B. Remove or filter canceled deliveries from the completed-delivery calculation, then summarize the remaining valid delivery times.
  • C. Replace missing values with the largest observed delivery time to be conservative.
  • D. Convert the column to text so missing values do not affect the calculation.

Best answer: B

Explanation: The target question is typical time for completed deliveries. Canceled rows are not completed deliveries, so they should not be forced into the numeric calculation. Filtering to the relevant population before summarizing is stronger than filling values without understanding why they are missing.

Why the other choices are weaker:

  • A changes the meaning of the average by treating non-deliveries as zero-minute deliveries.
  • C adds artificial extreme values and distorts the summary.
  • D avoids numeric calculation instead of preparing the data correctly.

What this tests: Missing-value meaning, population selection, and summary-statistic validity.

Related topics: Missing data; Filtering; Averages; Data cleaning; Data meaning


Question 2

Topic: Train/test split and leakage

A student builds a simple model to predict whether a customer will churn. They scale the full dataset, select features using all rows, and then split into training and test sets. Why is this a problem?

  • A. The test set has influenced preprocessing and feature selection, so the evaluation may be overly optimistic.
  • B. Scaling is always forbidden in classification problems.
  • C. Train/test splits are useful only for image data.
  • D. Feature selection must always be done manually without software.

Best answer: A

Explanation: The test set should simulate unseen data. If preprocessing or feature selection learns from the full dataset before the split, information from the test rows can leak into model development. That can make test performance look better than real future performance.

Why the other choices are weaker:

  • B is false; scaling can be appropriate depending on the model.
  • C is false; train/test evaluation applies broadly.
  • D is false; software can help with feature selection when it is applied correctly inside the training workflow.

What this tests: Data leakage, train/test workflow, preprocessing order, and model-evaluation discipline.

Related topics: Train/test split; Leakage; Preprocessing; Feature selection; Evaluation


Question 3

Topic: Choosing a visualization

A manager wants to see whether monthly sales have a seasonal pattern over the last three years. Which chart is the best starting point?

  • A. A pie chart with one slice for each month across all years.
  • B. A scatter plot of customer ID versus invoice number.
  • C. A line chart with time on the x-axis and sales on the y-axis, optionally grouped or colored by year.
  • D. A single table showing only the maximum monthly sale.

Best answer: C

Explanation: Seasonality is a time-based pattern. A line chart preserves month order and makes repeated rises or drops easier to compare across years. Grouping or coloring by year can help reveal recurring monthly behavior.

Why the other choices are weaker:

  • A makes comparisons crowded and loses the sequential nature of time.
  • B plots identifiers rather than the business measure and time pattern.
  • D hides nearly all temporal information.

What this tests: Matching a visualization to a question, especially time-series pattern recognition.

Related topics: Visualization; Time series; Seasonality; Line charts; Exploratory analysis


Question 4

Topic: Interpreting correlation

A learner finds that advertising spend and revenue have a strong positive correlation in a small dataset. What is the safest conclusion?

  • A. Advertising spend definitely caused the revenue increase.
  • B. Revenue definitely caused the advertising spend increase.
  • C. The relationship is impossible because correlation must be negative.
  • D. The variables move together in the observed data, but causation requires additional evidence or experimental design.

Best answer: D

Explanation: Correlation describes association, not causation. A positive correlation can be useful evidence, but it does not prove which variable caused the other or whether another factor influenced both.

Why the other choices are weaker:

  • A and B assert causal direction without enough evidence.
  • C is false because correlations can be positive, negative, or near zero.

What this tests: Statistical interpretation, causation limits, and careful communication of findings.

Related topics: Correlation; Causation; Statistics; Interpretation; Evidence

Independent study note

Tech Exam Lexicon and IT Mastery are independent study tools. They are not affiliated with, endorsed by, or sponsored by Python Institute or any certification body.

Revised on Sunday, May 10, 2026