python 编程练习——用线性回归模型根据黄松的直径预测木料英尺数

Python YX ⋅ 于 2020-05-13 18:35:43 ⋅ 最后回复由 姜姜 2020-05-20 14:56:43 ⋅ 363 阅读
根据下图黄松的数据,拟合线性回归模型,预测木料数量
  • 录入表中数据,可视化
  • 拟合三个线性回归模型,分别使用直径,直径的平方和直径的3次方
  • 评价这三个模型的误差(可选取若干已知数据作为评测基准)
  • 可使用 sklearn中的线性回归模型
  • 尝试解释三个模型误差有区别的原因

file

file

参考文献

数学建模(原书第5版)/华章数学译丛

file

成为第一个点赞的人吧 :bowtie:
回复数量: 8
  • RickyChen GKFCCCCCCCCCCCCCCCCCCCCCC
    2020-05-17 12:54:27
    import pandas as pd
    import numpy as np
    import matplotlib.pyplot as plt
    from sklearn import linear_model
    from sklearn.linear_model import LinearRegression
    from sklearn.preprocessing import PolynomialFeatures
    from sklearn.model_selection import train_test_split
    
    def trance(raw_list):
        a = []
        for i in range(0, len(raw_list), 1):
            b = raw_list[i:i+1]
            a.append(b)
        return a
    
    data = pd.read_excel('rawdata.xlsx')
    X = data['直径']
    Y = data['木料']
    X = trance(X.tolist())
    Y = trance(Y.tolist())
    x = np.array(X)
    
    model = LinearRegression()
    model1 = model.fit(X, Y)
    b1 = model1.intercept_
    w1 = model1.coef_
    y1 = w1*X+b1
    plt.subplot(3, 1, 1)
    plt.scatter(X, Y, marker='o')
    plt.plot(X, y1, color='green')
    print(model.score(X, Y))
    
    x1 = np.power(x, 2)
    X1 = x1.tolist()
    model2 = model.fit(X1, Y)
    b2 = model2.intercept_
    w2 = model2.coef_
    y2 = w2*X1+b2
    plt.subplot(3, 1, 2)
    plt.scatter(X1, Y, marker='o')
    plt.plot(X1, y2, color='red')
    print(model.score(X1, Y))
    x1 = np.power(x, 3)
    X1 = x1.tolist()
    model2 = model.fit(X1, Y)
    b2 = model2.intercept_
    w2 = model2.coef_
    y3 = w2*X1+b2
    plt.subplot(3, 1, 3)
    plt.scatter(X1, Y, marker='o')
    plt.plot(X1, y3, color='black')
    print(model.score(X1, Y))
    plt.xlabel(' Diameter ')
    plt.ylabel(' Material area ')
    plt.legend()
    plt.show()
    
  • RickyChen GKFCCCCCCCCCCCCCCCCCCCCCC
    2020-05-17 12:55:16

    只是做了功能实现,没有注意编码风格,各位大佬轻拍

  • RickyChen GKFCCCCCCCCCCCCCCCCCCCCCC
    2020-05-17 13:07:29

    0.9534194398757392

    0.9755173917668096

    0.9767573373910827

  • RickyChen GKFCCCCCCCCCCCCCCCCCCCCCC
    2020-05-17 13:08:14

    file

  • 热心市民小杨
    2020-05-17 20:37:37
    import numpy as np
    from sklearn import linear_model
    from sklearn.model_selection import train_test_split
    from sklearn.metrics import r2_score
    import matplotlib.pyplot as plt
    
    def loadDataSet(filename):
        numFeature = len(open(filename).readline().split('\t')) - 1  # 获取特征数,因为txt文件中最后一个为真实值,所以減去1
        xArry = []
        yArry = []
        with open(filename) as rstream:
            file = rstream.readlines()
            for fileline in file:
                lineArr = []
                curLine = fileline.split()
                for i in range(numFeature + 1):
                    lineArr.append(float(curLine[i]))
                xArry.append(lineArr)
                yArry.append(float(curLine[-1]))
        return xArry, yArry
    
    def calRegrcoef():
        xArr, yArr = loadDataSet('tree.txt')
        xArr = np.array(xArr)
        yArr = np.array(yArr)
        # X_train:训练集,X_test:测试集,Y_train:训练集真实值,Y_test:测试集真实值
        X_train, X_test, Y_train, Y_test = train_test_split(xArr, yArr, test_size=0.3)
        regr = linear_model.LinearRegression()
        regr.fit(X_train, Y_train)
        # 获取直径对应模型的系数与截距
        m_coef = regr.coef_
        m_inter = regr.intercept_
        y_pred = regr.predict(X_test)
        m_mesquer = r2_score(Y_test, y_pred)
        print('采用直徑的模型r2_score:', m_mesquer)
        regr.fit(X_train ** 2, Y_train)
        # 获取直径平方对应模型的系数与截距
        msquare_coef = regr.coef_
        msquare_inter = regr.intercept_
        y_pred = regr.predict(X_test ** 2)
        ms_mesquer = r2_score(Y_test, y_pred)
        print('采用直徑平方的模型的r2_score:', ms_mesquer)
        regr.fit(X_train ** 3, Y_train)
        # 获取直径立方对应模型的系数与截距
        mcube_coef = regr.coef_
        mcube_inte = regr.intercept_
        y_pred = regr.predict(X_test ** 3)
        mc_mesquer = r2_score(Y_test, y_pred)
        print('采用直徑立方的模型r2_score:', mc_mesquer)
        return m_coef, m_inter, msquare_coef, msquare_inter, mcube_coef, mcube_inte
    
    def plotDataSet():
        m_coef, m_inter, msquare_coef, msquare_inter, mcube_coef, mcube_inte = calRegrcoef()
        xArr, yArr = loadDataSet('tree.txt')
        xArr = np.array(xArr)
        y1 = m_coef * xArr + m_inter
        y2 = msquare_coef * (xArr ** 2) + msquare_inter
        y3 = mcube_coef * (xArr ** 3) + mcube_inte
        plt.subplot(3, 1, 1)
        plt.plot(xArr, y1, 'r')
        plt.scatter(xArr, yArr)
        plt.subplot(3, 1, 2)
        plt.plot(xArr ** 2, y2, 'g')
        plt.scatter(xArr ** 2, yArr)
        plt.subplot(3, 1, 3)
        plt.plot(xArr ** 3, y3, 'b')
        plt.scatter(xArr ** 3, yArr)
        plt.show()
    
    if __name__ == '__main__':
        plotDataSet()
  • 热心市民小杨
    2020-05-17 20:38:09

    繪圖結果:

    file

  • 热心市民小杨
    2020-05-17 20:41:06

    性能評價采用的是回歸模型評價指標r2_score(決定係數),結果如下:
    采用直徑的模型r2_score: 0.9596851274836486
    采用直徑平方的模型的r2_score: 0.9817400584252329
    采用直徑立方的模型r2_score: 0.981149361657007

  • 姜姜
    2020-05-20 14:56:43

    MSE1: 376.97123733027377
    RMSE1: 19.415747148391528
    MSE2: 229.7507143640461
    RMSE2: 15.157529955901328
    MSE3: 204.24186553884664
    RMSE3: 14.291321336351185

    file

暂无评论~~
  • 请注意单词拼写,以及中英文排版,参考此页
  • 支持 Markdown 格式, **粗体**、~~删除线~~、`单行代码`, 更多语法请见这里 Markdown 语法
  • 支持表情,使用方法请见 Emoji 自动补全来咯,可用的 Emoji 请见 :metal: :point_right: Emoji 列表 :star: :sparkles:
  • 上传图片, 支持拖拽和剪切板黏贴上传, 格式限制 - jpg, png, gif
  • 发布框支持本地存储功能,会在内容变更时保存,「提交」按钮点击时清空
Ctrl+Enter