Data processing NABTEB exam questions and answers
Here they are;
Question: What is the primary purpose of data preprocessing in machine learning?
a) Enhancing model interpretability
b) Reducing dimensionality
c) Improving data quality
d) Accelerating model training
Answer: c) Improving data quality
Question: Which technique is commonly used for handling missing data in a dataset?
a) Mean imputation
b) Mode imputation
c) Median imputation
d) Random imputation
Answer: a) Mean imputation
Question: Why is scaling important in data preprocessing?
a) To make data visually appealing
b) To speed up data loading
c) To standardize variable ranges
d) To reduce data dimensionality
Answer: c) To standardize variable ranges
Question: What is the purpose of outlier detection in data preprocessing?
a) To remove irrelevant data
b) To identify and handle extreme values
c) To increase dataset size
d) To introduce noise for model robustness
Answer: b) To identify and handle extreme values
Question: Which normalization technique is suitable for handling skewed data?
a) Min-Max scaling
b) Z-score normalization
c) Log transformation
d) Standard deviation scaling
Answer: c) Log transformation
Question: In feature engineering, what does one-hot encoding accomplish?
a) Reducing feature dimensionality
b) Handling missing data
c) Converting categorical variables into binary vectors
d) Scaling numerical features
Answer: c) Converting categorical variables into binary vectors
Question: What is the purpose of cross-validation in the context of data processing?
a) Enhancing model interpretability
b) Evaluating model performance on multiple subsets of data
c) Imputing missing values
d) Increasing the number of features
Answer: b) Evaluating model performance on multiple subsets of data
Question: How does PCA (Principal Component Analysis) contribute to data preprocessing?
a) It handles missing data
b) It reduces dimensionality
c) It normalizes data distribution
d) It transforms categorical variables
Answer: b) It reduces dimensionality
Question: Which technique is suitable for handling categorical variables with ordinal relationships?
a) Label encoding
b) One-hot encoding
c) Binary encoding
d) Target encoding
Answer: a) Label encoding
Question: What is the purpose of data discretization?
a) Handling missing data
b) Converting numerical variables into categorical variables
c) Scaling data
d) Reducing dimensionality
Answer: b) Converting numerical variables into categorical variables
Question: How does the Bag-of-Words representation contribute to natural language processing?
a) It encodes word order
b) It represents text as a set of unordered words
c) It handles grammatical errors
d) It increases sentence complexity
Answer: b) It represents text as a set of unordered words
Question: What role does feature scaling play in the k-nearest neighbors (KNN) algorithm?
a) It improves model interpretability
b) It speeds up the training process
c) It ensures equal contribution of features in distance calculations
d) It reduces the number of neighbors considered
Answer: c) It ensures equal contribution of features in distance calculations
Question: In time series data, what is the purpose of lag features?
a) They handle missing data
b) They encode categorical information
c) They capture temporal dependencies
d) They standardize variable ranges
Answer: c) They capture temporal dependencies
Question: Why might you use feature scaling before applying the k-means clustering algorithm?
a) To visualize data better
b) To handle missing values
c) To standardize variable ranges
d) To increase the number of clusters
Answer: c) To standardize variable ranges
Question: What does the term "data augmentation" refer to in the context of image processing?
a) Increasing the size of the dataset by duplicating records
b) Enhancing model interpretability
c) Generating new training samples by applying transformations to existing data
d) Reducing the number of features
Answer: c) Generating new training samples by applying transformations to existing data
Question: Which technique is used for handling the class imbalance problem in classification tasks?
a) Feature scaling
b) Data discretization
c) Oversampling minority class
d) Principal Component Analysis (PCA)
Answer: c) Oversampling minority class
Question: What is the purpose of stratified sampling in the context of data splitting?
a) It ensures an equal distribution of classes in both training and testing sets
b) It removes outliers from the dataset
c) It handles missing values
d) It increases the training set size
Answer: a) It ensures an equal distribution of classes in both training and testing sets
Question: How does the concept of feature engineering contribute to machine learning models?
a) It improves data visualization
b) It enhances model interpretability
c) It transforms raw data into informative features for model training
d) It reduces the need for data preprocessing
Answer: c) It transforms raw data into informative features for model training
Question: What is the purpose of a validation set in the model training process?
a) To train the model
b) To fine-tune hyperparameters and prevent overfitting
c) To test the model on unseen data
d) To handle missing values
Answer: b) To fine-tune hyperparameters and prevent overfitting
Question: In natural language processing, what is lemmatization?
a) Removing stop words from text
b) Converting words to their base or root form
c) Encoding words as numerical vectors
d) Increasing the vocabulary size
Answer: b) Converting words to their base or root form
Post a Comment