溫馨提示×

溫馨提示×

您好，登錄后才能下訂單哦！

密碼登錄×

忘記密碼？

登錄注冊×

獲取短信驗證碼

其他方式登錄

點擊登錄注冊即表示同意《億速云用戶服務條款》

用戶登錄×

賬戶密碼登錄

請使用微信掃描上方二維碼

使用幫助

請求超時！

請點擊重新獲取二維碼

如何進行Spark中MLlib的本質分析

發布時間：2021-12-16 18:43:28 來源：億速云閱讀：160 作者：柒染欄目：云計算

如何進行Spark中MLlib的本質分析，相信很多沒有經驗的人對此束手無策，為此本文總結了問題出現的原因和解決方法，通過這篇文章希望你能解決這個問題。

org.apache.spark.ml（http://spark.apache.org/docs/latest/ml-guide.html ）

org.apache.spark.ml.attribute
org.apache.spark.ml.classification
org.apache.spark.ml.clustering
org.apache.spark.ml.evaluation
org.apache.spark.ml.feature
org.apache.spark.ml.param
org.apache.spark.ml.recommendation
org.apache.spark.ml.regression
org.apache.spark.ml.source.libsvm
org.apache.spark.ml.tree
org.apache.spark.ml.tuning
org.apache.spark.ml.util

org.apache.spark.mllib （http://spark.apache.org/docs/latest/mllib-guide.html ）

org.apache.spark.mllib.classification
org.apache.spark.mllib.clustering
org.apache.spark.mllib.evaluation
org.apache.spark.mllib.feature
org.apache.spark.mllib.fpm
org.apache.spark.mllib.linalg
org.apache.spark.mllib.linalg.distributed
org.apache.spark.mllib.pmml
org.apache.spark.mllib.random
org.apache.spark.mllib.rdd
org.apache.spark.mllib.recommendation
org.apache.spark.mllib.regression
org.apache.spark.mllib.stat
org.apache.spark.mllib.stat.distributed
org.apache.spark.mllib.stat.test
org.apache.spark.mllib.tree
org.apache.spark.mllib.tree.configuration
org.apache.spark.mllib.tree.impurity
org.apache.spark.mllib.tree.loss
org.apache.spark.mllib.tree.model
org.apache.spark.mllib.util

ML概念

DataFrame: Spark ML uses DataFrame from Spark SQL as an ML dataset, which can hold a variety of data types. E.g., a DataFrame could have different columns storing text, feature vectors, true labels, and predictions.
Transformer: A Transformer is an algorithm which can transform one DataFrame into another DataFrame. E.g., an ML model is a Transformer which transforms DataFrame with features into a DataFrame with predictions.
Estimator: An Estimator is an algorithm which can be fit on a DataFrame to produce a Transformer. E.g., a learning algorithm is an Estimator which trains on a DataFrame and produces a model.
Pipeline: A Pipeline chains multiple Transformers and Estimators together to specify an ML workflow.
Parameter: All Transformers and Estimators now share a common API for specifying parameters.

ML分類和回歸

Classification
	Logistic regression
	Decision tree classifier
	Random forest classifier
	Gradient-boosted tree classifier
	Multilayer perceptron classifier
	One-vs-Rest classifier (a.k.a. One-vs-All)
Regression
	Linear regression
	Decision tree regression
	Random forest regression
	Gradient-boosted tree regression
	Survival regression
Decision trees
Tree Ensembles
	Random Forests
	Gradient-Boosted Trees (GBTs)

ML聚類

K-means
Latent Dirichlet allocation (LDA)

MLlib 數據類型

Local vector
Labeled point
Local matrix
Distributed matrix
	RowMatrix
	IndexedRowMatrix
	CoordinateMatrix
	BlockMatrix

MLlib 分類和回歸

Binary Classification: linear SVMs, logistic regression, decision trees, random forests, gradient-boosted trees, naive Bayes
Multiclass Classification:logistic regression, decision trees, random forests, naive Bayes
Regression:linear least squares, Lasso, ridge regression, decision trees, random forests, gradient-boosted trees, isotonic regression

MLlib 聚類

K-means
Gaussian mixture
Power iteration clustering (PIC,多用于圖像識別)
Latent Dirichlet allocation (LDA，多用于主題分類)
Bisecting k-means
Streaming k-means

MLlib Models

DecisionTreeModel
DistributedLDAModel
GaussianMixtureModel
GradientBoostedTreesModel
IsotonicRegressionModel
KMeansModel
LassoModel
LDAModel
LinearRegressionModel
LocalLDAModel
LogisticRegressionModel
MatrixFactorizationModel
NaiveBayesModel
PowerIterationClusteringModel
RandomForestModel
RidgeRegressionModel
StreamingKMeansModel
SVMModel
Word2VecModel

Example

import org.apache.spark.ml.classification.LogisticRegression 
import org.apache.spark.ml.param.ParamMap 
import org.apache.spark.mllib.linalg.{Vector, Vectors} 
import org.apache.spark.sql.Row 

val training = sqlContext.createDataFrame(Seq(   (1.0, Vectors.dense(0.0, 1.1, 0.1)),   (0.0, Vectors.dense(2.0, 1.0, -1.0)),   (0.0, Vectors.dense(2.0, 1.3, 1.0)),   (1.0, Vectors.dense(0.0, 1.2, -0.5)) ))
    .toDF("label", "features") 
val lr = new LogisticRegression()
println("LogisticRegression parameters:\n" + lr.explainParams() + "\n") 

lr.setMaxIter(10).setRegParam(0.01) 
val model1 = lr.fit(training) 
println("Model 1 was fit using parameters: " + model1.parent.extractParamMap) 

val paramMap = ParamMap(lr.maxIter -> 20)
    .put(lr.maxIter, 30)
    .put(lr.regParam -> 0.1, lr.threshold -> 0.55)
val paramMap2 = ParamMap(lr.probabilityCol -> "myProbability") 
val paramMapCombined = paramMap ++ paramMap2
val model2 = lr.fit(training, paramMapCombined)
println("Model 2 was fit using parameters: " + model2.parent.extractParamMap)

test = sqlContext.createDataFrame(Seq(   (1.0, Vectors.dense(-1.0, 1.5, 1.3)),   (0.0, Vectors.dense(3.0, 2.0, -0.1)),   (1.0, Vectors.dense(0.0, 2.2, -1.5)) ))
    .toDF("label", "features")
model2.transform(test)
    .select("features", "label", "myProbability", "prediction")
    .collect()
    .foreach { case Row(features: Vector, label: Double, prob: Vector, prediction: Double) => println(s"($features, $label) -> prob=$prob, prediction=$prediction")   }

看完上述內容，你們掌握如何進行Spark中MLlib的本質分析的方法了嗎？如果還想學到更多技能或想了解更多相關內容，歡迎關注億速云行業資訊頻道，感謝各位的閱讀！

向AI問一下細節

推薦閱讀：

免責聲明：本站發布的內容（圖片、視頻和文字）以原創、轉載和分享為主，文章觀點不代表本網站立場，如果涉及侵權請聯系站長郵箱：is@yisu.com進行舉報，并提供相關證據，一經查實，將立刻刪除涉嫌侵權內容。

上一篇新聞：
怎么進行spark.streaming.concurrentJobs參數解密的分析
下一篇新聞：
怎么解析Python中的Dict

猜你喜歡

AI
助
手

產品服務

地區劃分

專題活動

幫助支持

關于我們

售后咨詢

7*24小時在線電話：400-100-2938

7*24小時在線 QQ：800811969

關注億速云

億速云公眾號

手機網站二維碼

亚洲午夜精品一区二区_中文无码日韩欧免_久久香蕉精品视频_欧美主播一区二区三区美女