日日操夜夜添-日日操影院-日日草夜夜操-日日干干-精品一区二区三区波多野结衣-精品一区二区三区高清免费不卡

<input id="aiwk8"><em id="aiwk8"></em></input>

<input id="aiwk8"><tbody id="aiwk8"></tbody></input>

<sup id="aiwk8"></sup>

<dd id="aiwk8"><object id="aiwk8"></object></dd>

公告：魔扣目錄網(wǎng)為廣大站長提供免費收錄網(wǎng)站服務(wù)，提交前請做好本站友鏈：【網(wǎng)站目錄：http://www.ylptlb.cn 】，免友鏈快審服務(wù)（50元/站），

點擊這里在線咨詢客服

網(wǎng)站：51998
待審：31
小程序：12
文章：1030137
會員：747

首頁 > 新聞資訊 > IT業(yè)界 >正文

在Python中創(chuàng)建相關(guān)系數(shù)矩陣的六種方法

發(fā)布時間：2023-09-30 21:04:18 作者：網(wǎng)友整理

相關(guān)系數(shù)矩陣（Correlation matrix）是數(shù)據(jù)分析的基本工具。它們讓我們了解不同的變量是如何相互關(guān)聯(lián)的。在Python/ target=_blank class=infotextkey>Python中，有很多個方法可以計算相關(guān)系數(shù)矩陣，今天我們來對這些方法進行一個總結(jié)

Pandas

Pandas的DataFrame對象可以使用corr方法直接創(chuàng)建相關(guān)矩陣。由于數(shù)據(jù)科學(xué)領(lǐng)域的大多數(shù)人都在使用Pandas來獲取數(shù)據(jù)，因此這通常是檢查數(shù)據(jù)相關(guān)性的最快、最簡單的方法之一。

 import pandas as pd
 import seaborn as sns
 
 data = sns.load_dataset('mpg')
 correlation_matrix = data.corr(numeric_only=True)
 correlation_matrix

如果你是統(tǒng)計和分析相關(guān)工作的，你可能會問" p值在哪里？"，在最后我們會有介紹

Numpy

Numpy也包含了相關(guān)系數(shù)矩陣的計算函數(shù)，我們可以直接調(diào)用，但是因為返回的是ndarray，所以看起來沒有pandas那么清晰。

 import numpy as np
 from sklearn.datasets import load_iris
 
 iris = load_iris()
 np.corrcoef(iris["data"])

為了更好的可視化，我們可以直接將其傳遞給sns.heatmap()函數(shù)。

 import seaborn as sns
 
 data = sns.load_dataset('mpg')
 correlation_matrix = data.corr()
 
 sns.heatmap(data.corr(),
            annot=True,
            cmap='coolwarm')

annot=True這個參數(shù)可以輸出一些額外的有用信息。一個常見hack是使用sns.set_context('talk')來獲得額外的可讀輸出。

這個設(shè)置是為了生成幻燈片演示的圖像，它能幫助我們更好地閱讀(更大的字體)。

Statsmodels

Statsmodels這個統(tǒng)計分析庫也是肯定可以的

 import statsmodels.api as sm
 
 correlation_matrix = sm.graphics.plot_corr(
    data.corr(),
    xnames=data.columns.tolist())

plotly

默認(rèn)情況下plotly這個結(jié)果是如何從左下到右上運行對角線1.0的。這種行為與大多數(shù)其他工具相反，所以如果你使用plotly需要特別注意

 import plotly.offline as pyo
 pyo.init_notebook_mode(connected=True)
 
 import plotly.figure_factory as ff
 
 correlation_matrix = data.corr()
 
 fig = ff.create_annotated_heatmap(
    z=correlation_matrix.values,
    x=list(correlation_matrix.columns),
    y=list(correlation_matrix.index),
    colorscale='Blues')
 
 fig.show()

Pandas + Matplotlib更好的可視化

這個結(jié)果也可以直接使用用sns.pAIrplot(data)，兩種方法產(chǎn)生的圖差不多，但是seaborn只需要一句話

 sns.pairplot(df[['mpg','weight','horsepower','acceleration']])

所以我們這里介紹如何使用Matplotlib來實現(xiàn)

 import matplotlib.pyplot as plt
 
 pd.plotting.scatter_matrix(
    data, alpha=0.2,
    figsize=(6, 6),
    diagonal='hist')
 
 plt.show()

相關(guān)性的p值

如果你正在尋找一個簡單的矩陣(帶有p值)，這是許多其他工具(SPSS, Stata, R, SAS等)默認(rèn)做的，那如何在Python中獲得呢？

這里就要借助科學(xué)計算的scipy庫了，以下是實現(xiàn)的函數(shù)

 from scipy.stats import pearsonr
 import pandas as pd
 import seaborn as sns
 
 def corr_full(df, numeric_only=True, rows=['corr', 'p-value', 'obs']):
    """
    Generates a correlation matrix with correlation coefficients,
    p-values, and observation count.
     
    Args:
    - df:                 Input dataframe
    - numeric_only (bool): Whether to consider only numeric columns for
                            correlation. Default is True.
    - rows:               Determines the information to show.
                            Default is ['corr', 'p-value', 'obs'].
     
    Returns:
    - formatted_table: The correlation matrix with the specified rows.
    """
     
    # Calculate Pearson correlation coefficients
    corr_matrix = df.corr(
        numeric_only=numeric_only)
     
    # Calculate the p-values using scipy's pearsonr
    pvalue_matrix = df.corr(
        numeric_only=numeric_only,
        method=lambda x, y: pearsonr(x, y)[1])
     
    # Calculate the non-null observation count for each column
    obs_count = df.Apply(lambda x: x.notnull().sum())
     
    # Calculate observation count for each pair of columns
    obs_matrix = pd.DataFrame(
        index=corr_matrix.columns, columns=corr_matrix.columns)
    for col1 in obs_count.index:
        for col2 in obs_count.index:
            obs_matrix.loc[col1, col2] = min(obs_count[col1], obs_count[col2])
         
    # Create a multi-index dataframe to store the formatted correlations
    formatted_table = pd.DataFrame(
        index=pd.MultiIndex.from_product([corr_matrix.columns, rows]),
        columns=corr_matrix.columns
    )
     
    # Assign values to the appropriate cells in the formatted table
    for col1 in corr_matrix.columns:
        for col2 in corr_matrix.columns:
            if 'corr' in rows:
                formatted_table.loc[
                    (col1, 'corr'), col2] = corr_matrix.loc[col1, col2]
             
            if 'p-value' in rows:
                # Avoid p-values for diagonal they correlate perfectly
                if col1 != col2:
                    formatted_table.loc[
                        (col1, 'p-value'), col2] = f"({pvalue_matrix.loc[col1, col2]:.4f})"
            if 'obs' in rows:
                formatted_table.loc[
                    (col1, 'obs'), col2] = obs_matrix.loc[col1, col2]
     
    return(formatted_table.fillna('')
            .style.set_properties(**{'text-align': 'center'}))

直接調(diào)用這個函數(shù)，我們返回的結(jié)果如下：

 df = sns.load_dataset('mpg')
 result = corr_full(df, rows=['corr', 'p-value'])
 result

總結(jié)

我們介紹了Python創(chuàng)建相關(guān)系數(shù)矩陣的各種方法，這些方法可以隨意選擇（那個方便用哪個）。Python中大多數(shù)工具的標(biāo)準(zhǔn)默認(rèn)輸出將不包括p值或觀察計數(shù)，所以如果你需要這方面的統(tǒng)計，可以使用我們子厚提供的函數(shù)，因為要進行全面和完整的相關(guān)性分析，有p值和觀察計數(shù)作為參考是非常有幫助的。

分享到：

標(biāo)簽：Python

網(wǎng)友整理

注冊時間：

網(wǎng)站：5 個小程序：0 個文章：12 篇

51998
網(wǎng)站
12
小程序
1030137
文章
747
會員

趕快注冊賬號，推廣您的網(wǎng)站吧！

文章分類

熱門網(wǎng)站

各百科-專業(yè)百科問答知識名網(wǎng)站 m.geelcn.com
免費軟件,綠色軟件園,手機軟件下載,熱門游戲下載中心-中當(dāng)網(wǎng) m.deelcn.com
魔扣科技 www.ylptlb.cn
體育新聞_國際體育資訊_全球體育賽事-中名網(wǎng) www.feelcn.com/tiyu/tiyuxinwen/
食品安全_健康飲食_舌尖上的安全-中名網(wǎng) www.feelcn.com/shenghuo/shipinanquan/
中合網(wǎng) www.heelcn.com
中當(dāng)網(wǎng) www.deelcn.com
魔扣網(wǎng)站維護代運營 www.ylptlb.cn/tg
中合網(wǎng)-健康養(yǎng)生知識科普名站 m.heelcn.com
各百科 www.geelcn.com

最新入駐小程序

數(shù)獨大挑戰(zhàn)

數(shù)獨大挑戰(zhàn)2018-06-03

數(shù)獨一種數(shù)學(xué)游戲，玩家需要根據(jù)9

答題星

答題星2018-06-03

您可以通過答題星輕松地創(chuàng)建試卷

全階人生考試

全階人生考試2018-06-03

各種考試題，題庫，初中，高中，大學(xué)四六

運動步數(shù)有氧達人

運動步數(shù)有氧達人2018-06-03

記錄運動步數(shù)，積累氧氣值。還可偷

每日養(yǎng)生app

每日養(yǎng)生app2018-06-03

每日養(yǎng)生,天天健康

體育訓(xùn)練成績評定

體育訓(xùn)練成績評定2018-06-03

通用課目體育訓(xùn)練成績評定

熱門文章