Are you looking for Data Science with Scala exam answers by Cognitive Class? If yes, this article will help you find all the questions and answers in the Cognitive Class Data Science with Scala Quiz. I have followed this article to solve all the questions for this exam.

In this course, you will learn about Basic statistics and data types, Preparing data, Feature engineering, Fitting a model, and Pipeline and grid search. Apache Spark™ is a fast and general engine for large-scale data processing, with built-in modules for streaming, machine learning, and graph processing. This course shows you how to use Spark’s machine learning pipelines to fit models and search for optimal hyperparameters using a Spark cluster.

Organization | Cognitive Class |

Eligibility | Data Scientists and Data Engineers interested in working with Big Data |

Level | Beginner |

Duration | 6hr |

Language | English |

Price | Free |

Certificate | Yes |

Data Science with Scala | Click Here |

## Cognitive Class – Data Science with Scala Answers

### Module 1: Basic Statistics and Data Types

**1. You import MLlib’s vectors from ?**

**2. Select the types of distributed Matrices :**

**3. How would you caculate the mean of the following ?**

val observations: RDD[Vector] = sc.parallelize(Array(

Vectors.dense(1.0, 2.0),

Vectors.dense(4.0, 5.0),

Vectors.dense(7.0, 8.0)))

val summary: MultivariateStatisticalSummary = Statistics.colStats(observations)

**4. what task does the following lines of code?**

import org.apache.spark.mllib.random.RandomRDDs._

val million = poissonRDD(sc, mean=1.0, size=1000000L, numPartitions=10)

**5. MLlib uses the compressed sparse column format for sparse matrices, as Such it only keeps the non-zero entrees?**

### Module 2: Preparing Data

**1. WFor a dataframe object the method describe calculates the ?**

**2. What line of code drops the rows that contain null values, select the best answer ?**

**3. What task does the following lines of code perform ?**

val lr = new LogisticRegression()

lr.setMaxIter(10).setRegParam(0.01)

val model1 = lr.fit(training)

**4. The StandardScaleModel transforms the data such that ?**

### Module 3: Feature Engineering

**1. Spark ML works with?**

**2. the function IndexToString() performs One hot encoding?**

**3. Principal Component Analysis is Primarily used for ?**

**4. one import set prior to using PCA is ?**

### Module 4: Fitting a Model

**1. You can use decision trees for ?**

**2. the following lines of code: val Array(trainingData, testData) = data.randomSplit(Array(0.7, 0.3))**

**3. in the Random Forest Classifier constructor .setNumTrees() ?**

**4. Elastic net regularization uses ?**

### Module 5: Pipeline and Grid Search

**1. what task does the following code perform: withColumn(“paperscore”, data(“A2”) * 4 + data(“A”) * 3) ?**

**2. In an estimator ?**

**3. Which is not a valid type of Evaluator in MLlib?**

**4. In the following lines of code, the last transform in the pipeline is a:**

val rf = new RandomForestClassifier().setFeaturesCol(“assembled”).setLabelCol(“status”).setSeed(42)

import org.apache.spark.ml.Pipeline

val pipeline = new Pipeline().setStages(Array(value_band_indexer,category_indexer,label_indexer,assembler,rf))

### Final Exam Answers

**1. What is not true about labeled points?**

**2. Which is true about column pointers in sparse matrices?**

**3. What is the name of the most basic type of distributed matrix?**

**4. A perfect correlation is represented by what value?**

**5. A MinMaxScaler is a transformer which:**

**6. Which is not a supported Random Data Generation distribution?**

**7. Sampling without replacement means:**

**8. What are the supported types of hypothesis testing?**

**9. For Kernel Density Estimation, which kernel is supported by Spark?**

**10. Which DataFrames statistics method computes the pairwise frequency table of the given columns?**

**11. Which is not true about the fill method for DataFrame NA functions?**

**12. Which transformer listed below is used for Natural Language processing?**

**13. Which is true about the Mahalanobis Distance?**

**14. Which is true about OneHotEncoder?**

**15. Principle Component Analysis is:**

**16. MLlib’s implementation of decision trees:**

**17. Which is not a tunable of SparkML decision trees?**

**18. Which is true about Random Forests?**

**19. When comparing Random Forest versus Gradient-Based Trees, what must you consider?**

**20. Which is not a valid type of Evaluator in MLlib?**

### Wrap Up

I hope this article would be useful for you to find all the “Cognitive Class Answers: Data Science with Scala Quiz Answers”. If this article helped you to learn something new for free then share it on social media and let others know about this and check out the other free courses that we have shared here.