scikit-learn初探
KFold
k交叉验证,k-1个作为训练集,剩下的作为测试集
split
split(X, y=None, groups=None)
X: (n_samples, n_features)的矩阵,行数为n_samples,列数为n_features
y:(n_samples,)为列向量,表示监督学习中的目标变量
返回的是训练集的索引集,测试集的索引集
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import KFoldX = np.array([[1, 2], [3, 4], [5, 6], [7, 8]])
print(X.shape)
y = np.array([1, 2, 3, 4])
print(y.shape)
kf = KFold(n_splits=2)
kf.get_n_splits(X)
print(kf)
for i, (train_index, test_index) in enumerate(kf.split(X, y)):print(f"Fold {i}:")print(f" Train: index={train_index}")print(f" Test: index={test_index}")print(X[train_index], X[test_index])
输出为
(4, 2)
(4,)
KFold(n_splits=2, random_state=None, shuffle=False)
Fold 0:Train: index=[2 3]Test: index=[0 1]
[[5 6][7 8]] [[1 2][3 4]]
Fold 1:Train: index=[0 1]Test: index=[2 3]
[[1 2][3 4]] [[5 6][7 8]]
理解:X,y构成新的矩阵,在分解时,对这新的矩阵分解