Python Institute PCED sample questions with explanations, traps, topic labels, and IT Mastery route links.
These original sample questions are designed to help you check how the exam topics appear in decision-style prompts. They are not taken from the live exam.
Use these sample questions as a guided self-assessment for Certified Entry-Level Python for Data Science (PCED) topics such as Python data handling, tabular data, missing values, summary statistics, visualization, train/test split, model evaluation, and responsible data work. The prompts focus on data-shape decisions rather than memorized library names.
The sample set below is part of the Python Institute PCED guide path:
Work through each prompt before opening the explanation. PCED questions reward careful reasoning about rows, columns, missing data, leakage, metrics, and how a chart or statistic supports the question being asked.
Topic: Missing values before averaging
A dataset has a delivery_minutes column. Some rows are missing the value because the delivery was canceled before dispatch. A learner wants to report the typical delivery time for completed deliveries. What should they do first?
0 because canceled deliveries took no time.Best answer: B
Explanation: The target question is typical time for completed deliveries. Canceled rows are not completed deliveries, so they should not be forced into the numeric calculation. Filtering to the relevant population before summarizing is stronger than filling values without understanding why they are missing.
Why the other choices are weaker:
What this tests: Missing-value meaning, population selection, and summary-statistic validity.
Related topics: Missing data; Filtering; Averages; Data cleaning; Data meaning
Topic: Train/test split and leakage
A student builds a simple model to predict whether a customer will churn. They scale the full dataset, select features using all rows, and then split into training and test sets. Why is this a problem?
Best answer: A
Explanation: The test set should simulate unseen data. If preprocessing or feature selection learns from the full dataset before the split, information from the test rows can leak into model development. That can make test performance look better than real future performance.
Why the other choices are weaker:
What this tests: Data leakage, train/test workflow, preprocessing order, and model-evaluation discipline.
Related topics: Train/test split; Leakage; Preprocessing; Feature selection; Evaluation
Topic: Choosing a visualization
A manager wants to see whether monthly sales have a seasonal pattern over the last three years. Which chart is the best starting point?
Best answer: C
Explanation: Seasonality is a time-based pattern. A line chart preserves month order and makes repeated rises or drops easier to compare across years. Grouping or coloring by year can help reveal recurring monthly behavior.
Why the other choices are weaker:
What this tests: Matching a visualization to a question, especially time-series pattern recognition.
Related topics: Visualization; Time series; Seasonality; Line charts; Exploratory analysis
Topic: Interpreting correlation
A learner finds that advertising spend and revenue have a strong positive correlation in a small dataset. What is the safest conclusion?
Best answer: D
Explanation: Correlation describes association, not causation. A positive correlation can be useful evidence, but it does not prove which variable caused the other or whether another factor influenced both.
Why the other choices are weaker:
What this tests: Statistical interpretation, causation limits, and careful communication of findings.
Related topics: Correlation; Causation; Statistics; Interpretation; Evidence
Tech Exam Lexicon and IT Mastery are independent study tools. They are not affiliated with, endorsed by, or sponsored by Python Institute or any certification body.