Browse Python Institute Certification Guides

Python Institute PCAD Sample Questions with Explanations

Python Institute PCAD sample questions with explanations, traps, topic labels, and IT Mastery route links.

These original sample questions are designed to help you check how the exam topics appear in decision-style prompts. They are not taken from the live exam.

Use these sample questions as a guided self-assessment for Certified Associate Python for Data Science (PCAD) topics such as data cleaning, joins, grouping, feature preparation, train/test separation, visualization, model evaluation, and responsible analysis. The prompts focus on choosing a defensible workflow when data shape and model behavior matter.

Where these questions fit in the PCAD guide

The sample set below is part of the Python Institute PCAD guide path:

PCAD data-workflow sample questions

Work through each prompt before opening the explanation. PCAD questions reward more than library recall: you need to reason about tabular structure, leakage, missingness, metrics, and what a result actually proves.


Question 1

Topic: Joining datasets safely

A data analyst has an orders table with one row per order and a customers table with one row per customer. The goal is to add each customer’s region to each order without changing the number of order rows. Which check is most important before trusting the joined result?

  • A. Confirm that both tables have the same number of columns before joining.
  • B. Sort both tables alphabetically by every text column before joining.
  • C. Confirm that the customer key is unique in the customer table and that the join did not unexpectedly duplicate or drop order rows.
  • D. Convert every numeric column to text so the merge keeps all values.

Best answer: C

Explanation: The intended output is still one row per order. If the customer table has duplicate customer keys, a join can multiply order rows. If keys are missing or mismatched, a join can also drop or create null-enriched rows depending on join type. Row-count and key-uniqueness checks protect the meaning of the result.

Why the other choices are weaker:

  • A is unrelated to whether join keys are valid.
  • B may help visual inspection, but sorting does not prove join correctness.
  • D can damage data types and does not solve key uniqueness or row preservation.

What this tests: Join keys, cardinality, row-count validation, and preserving the unit of analysis.

Related topics: Joins; Keys; Cardinality; Data validation; Tabular data


Question 2

Topic: Feature scaling without leakage

A learner is preparing a model with numeric features measured on different scales. They plan to standardize the features. Which workflow is strongest?

  • A. Fit the scaler on the training data, transform the training data, and use the same fitted scaler to transform validation or test data.
  • B. Fit one scaler on the training data and a separate scaler on the test data so each split is centered independently.
  • C. Fit the scaler on the full dataset before splitting so the test set has the best possible scale estimates.
  • D. Skip train/test separation because scaling already makes the data fair.

Best answer: A

Explanation: Preprocessing that learns from data should be fitted only on the training split. The validation or test split should be transformed with the already-fitted training scaler so evaluation reflects how the workflow handles unseen data.

Why the other choices are weaker:

  • B lets the test split define its own preprocessing parameters, which does not match production behavior.
  • C leaks information from the test set into training preparation.
  • D confuses scaling with model-evaluation discipline.

What this tests: Leakage control, preprocessing order, scaling, and train/test workflow.

Related topics: Feature scaling; Train/test split; Leakage; Pipelines; Model evaluation


Question 3

Topic: Class imbalance and evaluation

A binary classifier predicts rare fraud events. It reports 98 percent accuracy, but the positive class appears in only 2 percent of transactions. What should the analyst do next?

  • A. Accept the model because 98 percent accuracy is always strong.
  • B. Evaluate only the training score because fraud labels are hard to collect.
  • C. Remove all non-fraud rows so accuracy becomes easier to interpret.
  • D. Review class-specific metrics such as precision, recall, confusion matrix results, and the business cost of false positives and false negatives.

Best answer: D

Explanation: Accuracy can be misleading when one class is rare. A model that predicts every transaction as non-fraud could appear highly accurate while missing the target cases. Class-specific metrics and business-cost analysis show whether the model is useful for the actual decision.

Why the other choices are weaker:

  • A ignores class imbalance.
  • B avoids the unseen-data question and overstates model confidence.
  • C discards the negative class context needed to evaluate false positives.

What this tests: Imbalanced classification, metric selection, confusion matrices, and model interpretation.

Related topics: Accuracy; Precision; Recall; Class imbalance; Confusion matrix


Question 4

Topic: Responsible visualization

A chart compares average salaries across departments, but one department has two employees and another has two hundred. The averages differ substantially. What is the most responsible way to present the result?

  • A. Hide the sample sizes so the chart stays clean.
  • B. Show the averages with sample sizes and, when useful, distribution or uncertainty context before drawing strong conclusions.
  • C. Convert the chart to a pie chart because pie charts remove sample-size concerns.
  • D. Delete the smaller department because it makes the comparison inconvenient.

Best answer: B

Explanation: Averages can be meaningful, but sample size and distribution shape affect how confidently readers should interpret them. Showing counts, ranges, distribution markers, or uncertainty context helps prevent an overconfident conclusion from a small group.

Why the other choices are weaker:

  • A removes critical context.
  • C changes chart type without addressing interpretation risk.
  • D silently changes the population and can bias the analysis.

What this tests: Responsible communication, sample size, distributions, and avoiding misleading summaries.

Related topics: Visualization; Summary statistics; Sample size; Bias; Communication

Independent study note

Tech Exam Lexicon and IT Mastery are independent study tools. They are not affiliated with, endorsed by, or sponsored by Python Institute or any certification body.

Revised on Sunday, May 10, 2026