Създаване на модел за машинно обучение в Streamlit с помощта на Python

Какво е Streamlit?

Streamlit е лесна за използване рамка за приложения с отворен код, подходяща за изграждане на приложения за наука за данни. Той е базиран на Python и има прост и интуитивен потребителски интерфейс, което го прави чудесен инструмент за специалисти по данни, които искат да изградят мощни модели за машинно обучение (ML) бързо и лесно.

В този урок ще ви покажа как да създадете модел за машинно обучение с помощта на Streamlit и Python. Предполагам, че имате инсталиран Python 3.8 и Streamlit на вашата машина (ако не !pip install streamlit).

Време е да получите кодиране!

Първо ще импортираме Streamlit, numpy, sklearn и други модули в нашия проект. Имаме нужда от Streamlit, за да създадем потребителския интерфейс на нашия ML модел, и numpy и sklearn, за да изградим ML модела:

import pandas as pd
import numpy as np
import streamlit as st
from sklearn.metrics import classification_report, confusion_matrix, accuracy_score
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
from sklearn.preprocessing import StandardScaler
from sklearn.feature_selection import SelectKBest
from sklearn.feature_selection import chi2
from sklearn.ensemble import ExtraTreesClassifier
import statsmodels.api as sm
import matplotlib.pyplot as plt
import seaborn as sns

След това ще създадем приложение Streamlit. Първо, ще искаме да дадем заглавие на нашето приложение. Също така можем да изберем да персонализираме URL адреса с помощта на „set_page_config“. Също така намирам за полезно да добавя част от документацията към приложението, за да може потребителят да се запознае с това как работят нещата.

#Creating a title for our link/url and then also for our app view
st.set_page_config(page_title='Streamlit_ML_Test', layout='wide')
st.title('Welcome to My First ML App!')

#This is a header that expands so the user can toggle it in and out of view
with st.expander("See ML App Documentation"):
    st.subheader('Directions')
    st.write('Write your directions here')

Сега нека да преминем към самия модел на машинно обучение. Ще изградим функция за логистична регресия, за да улесним изпълнението на модела по-късно. Сега избрах логистична регресия, но тя може да бъде заменена с всеки тип модел на машинно обучение. Като се има предвид това, ако следвате, като използвате различен модел, не се колебайте да премахнете набора за мащабиране и валидиране.

#Let's build a function to run the model
def Logistic_Regression(df):
  #Setting features
    X = df.drop(target, axis=1)

  #Defined target
    y = df[target]

  #Filling nulls with the median
    X = X.fillna(X.median())
    y = y.fillna(y.median())

  #Creating a train, test, validation split
    X_train, X_test, y_train, y_test = train_test_split(
        X, y, test_size=0.2)
    X_train, X_val, y_train, y_val = train_test_split(
        X_train, y_train, test_size=0.25)
    
  #Scaling the data
    scaler = StandardScaler()
    scaled_X_train = scaler.fit_transform(X_train)
    scaled_X_val = scaler.transform(X_val)

  #Creating our logistic regression model and fitting it
    logreg = LogisticRegression()
    logreg.fit(scaled_X_train, y_train)

  #Predicting our target
    y_pred = logreg.predict(scaled_X_val)

  #Creating our confusion matrix and classification report
    cm = (confusion_matrix(y_val, y_pred))
    cr = (classification_report(y_val, y_pred, output_dict=True))
    cr = pd.DataFrame(cr).transpose()

  #Writing our Confusion Matrix and Classification Report to view in the app
    st.write(cm)
    st.write(cr)

  #Bringing the accuracy score to view
    st.info(round(accuracy_score(y_val, y_pred, normalize=True), 2))

След това нека направим област, в която можем да качваме нашите .csv или .xlsx файлове, които след това ще станат нашата рамка с данни.

#Encoding our dataset to UTF-8
@st.experimental_memo
def convert_df(df):
   return df.to_csv(index=False).encode('utf-8')

#Uploading our file and bringing it to view
with st.sidebar.header('Upload your CSV'):
    uploaded_file = st.file_uploader(
        "Upload spreadsheet", type=["csv", "xlsx"])
    # Check if file was uploaded
    if uploaded_file:
        # Check MIME type of the uploaded file
        if uploaded_file.type == "text/csv":
            df = pd.read_csv(uploaded_file)
        else:
            df = pd.read_excel(uploaded_file)

#Formatting the dataset view and preventing an error message that automatically
#Will populate if the user doesn't submit a file
st.subheader('Dataset')
if uploaded_file:
    st.write('Here is the dataset')
    st.write(df)
else:
    st.info('Please Upload Dataset')

Сега, след като обучихме нашия модел, можем да актуализираме нашето приложение Streamlit, за да му дадем интерфейс на живо. Ще добавим бутон за правене на прогнози, текстово поле за въвеждане на функции и текстово поле за показване на резултатите от прогнозите:

#Once the file is uploaded streamlit form will begin
if uploaded_file:
    #Putting the user input in the sidebar
    with st.sidebar.form(key="eve"):
        with st.sidebar:
            #User selects unique id from list of column headers
            st.sidebar.header('Select Unique Identifier')
            unique_id = st.selectbox("Please select a unique identifier (Ex:Id, Account Name, Order Number)",df.columns)

            #User can select columns to drop
            st.sidebar.header('Drop Columns if needed')
            st.write("If you do not need to drop any fields then leave this box unchecked. **Note:** Dates/Date times need to be removed from the dataset before running.")
            to_drop = st.multiselect('Drop Columns', df.columns)
            st.sidebar.caption('Please select any columns that you would like to drop from the dataset (*Note:* The unique identifier you previously selected will be dropped automatically).')

            #Then user picks the target that they want to predict
            st.sidebar.header('Define Target Variable')
            st.sidebar.write('Please select the column that you would like to predict from the drop down menu.')
            target = st.selectbox('Target Variable', df.columns)
            st.sidebar.caption('Your target variable will drop from the dataset once selected and then submitted.')

            #Finally hit submit and the model begins to run!
            submit = st.form_submit_button("Predict")
    
    #Once submitted this if statement will begin to process
    if submit:
        #Cleaning dataset
        u_id = df[unique_id]
        df = df.drop(unique_id, axis=1)
        df = df.drop(to_drop, axis=1)
        st.write("Here is your cleaned dataset")
        st.write(df)

        #Building model using the function we built
        logistic_regression(df)

        #Allowing user the option to download their cleaned data
        st.caption('**Note:** downloading dataset will refresh page and clear results.')
        csv = convert_df(df)
        st.download_button(
           "Download Cleaned Data",
           csv,
           "CleanedDataset.csv",
           "text/csv",
           key='download-csv'
        )
    else:
        print("Error")

Този код трябва да ви помогне да създадете донякъде усъвършенстван модел Streamlit ML, използвайки Python. Може да се използва като отправна точка за създаване на още по-сложни ML приложения.

Внедряване на модел с GitHub

Сега, когато имаме основен код за изграждане на нашето приложение, трябва да го внедрим на streamlit, за да можем да го използваме. Първо отворете уебсайта на streamlit, където искате да създадете акаунт при тях. След това направете репо GitHub за вашите приложения, например User/Apps, и след това превключете репото на частно. След като вашето хранилище бъде създадено, ще трябва да добавите файл за вашия Python код „ML_App.py“. И накрая, лесната част е да се върнете към Streamlit ›› Вход ›› Свържете GitHub Repo ›› Изберете User/Apps/ML_App.py и след това оставете Streamlit да свърши останалото! Що се отнася до извършването на промени, всичко, което трябва да направите, е да актуализирате своя .py в Git ›› натиснете към главния клон ›› обновете своя streamlit и вашите промени ще се актуализират.

Пълен код за Github

import pandas as pd
import numpy as np
import streamlit as st
from sklearn.metrics import classification_report, confusion_matrix, accuracy_score
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
from sklearn.preprocessing import StandardScaler
from sklearn.feature_selection import SelectKBest
from sklearn.feature_selection import chi2
from sklearn.ensemble import ExtraTreesClassifier
import statsmodels.api as sm
import matplotlib.pyplot as plt
import seaborn as sns

#Encoding our dataset to UTF-8
@st.experimental_memo
def convert_df(df):
   return df.to_csv(index=False).encode('utf-8')

#Creating a title for our link/url and then also for our app view
st.set_page_config(page_title='Streamlit_ML_Test', layout='wide')
st.title('Welcome to My First ML App!')

#This is a header that expands so the user can toggle it in and out of view
with st.expander("See ML App Documentation"):
    st.subheader('Directions')
    st.write('Write your directions here')
    
#Let's build a function to run the model
def Logistic_Regression(df):
  #Setting features
    X = df.drop(target, axis=1)

  #Defined target
    y = df[target]

  #Filling nulls with the median
    X = X.fillna(X.median())
    y = y.fillna(y.median())

  #Creating a train, test, validation split
    X_train, X_test, y_train, y_test = train_test_split(
        X, y, test_size=0.2)
    X_train, X_val, y_train, y_val = train_test_split(
        X_train, y_train, test_size=0.25)
    
  #Scaling the data
    scaler = StandardScaler()
    scaled_X_train = scaler.fit_transform(X_train)
    scaled_X_val = scaler.transform(X_val)

  #Creating our logistic regression model and fitting it
    logreg = LogisticRegression()
    logreg.fit(scaled_X_train, y_train)

  #Predicting our target
    y_pred = logreg.predict(scaled_X_val)

  #Creating our confusion matrix and classification report
    cm = (confusion_matrix(y_val, y_pred))
    cr = (classification_report(y_val, y_pred, output_dict=True))
    cr = pd.DataFrame(cr).transpose()

  #Writing our Confusion Matrix and Classification Report to view in the app
    st.write(cm)
    st.write(cr)

  #Bringing the accuracy score to view
    st.info(round(accuracy_score(y_val, y_pred, normalize=True), 2))

#Uploading our file and bringing it to view
with st.sidebar.header('Upload your CSV'):
    uploaded_file = st.file_uploader(
        "Upload spreadsheet", type=["csv", "xlsx"])
    # Check if file was uploaded
    if uploaded_file:
        # Check MIME type of the uploaded file
        if uploaded_file.type == "text/csv":
            df = pd.read_csv(uploaded_file)
        else:
            df = pd.read_excel(uploaded_file)

#Formatting the dataset view and preventing an error message that automatically
#Will populate if the user doesn't submit a file
st.subheader('Dataset')
if uploaded_file:
    st.write('Here is the dataset')
    st.write(df)
else:
    st.info('Please Upload Dataset')

#Once the file is uploaded streamlit form will begin
if uploaded_file:
    #Putting the user input in the sidebar
    with st.sidebar.form(key="eve"):
        with st.sidebar:
            #User selects unique id from list of column headers
            st.sidebar.header('Select Unique Identifier')
            unique_id = st.selectbox("Please select a unique identifier (Ex:Id, Account Name, Order Number)",df.columns)

            #User can select columns to drop
            st.sidebar.header('Drop Columns if needed')
            st.write("If you do not need to drop any fields then leave this box unchecked. **Note:** Dates/Date times need to be removed from the dataset before running.")
            to_drop = st.multiselect('Drop Columns', df.columns)
            st.sidebar.caption('Please select any columns that you would like to drop from the dataset (*Note:* The unique identifier you previously selected will be dropped automatically).')

            #Then user picks the target that they want to predict
            st.sidebar.header('Define Target Variable')
            st.sidebar.write('Please select the column that you would like to predict from the drop down menu.')
            target = st.selectbox('Target Variable', df.columns)
            st.sidebar.caption('Your target variable will drop from the dataset once selected and then submitted.')

            #Finally hit submit and the model begins to run!
            submit = st.form_submit_button("Predict")
    
    #Once submitted this if statement will begin to process
    if submit:
        #Cleaning dataset
        u_id = df[unique_id]
        df = df.drop(unique_id, axis=1)
        df = df.drop(to_drop, axis=1)
        st.write("Here is your cleaned dataset")
        st.write(df)

        #Building model using the function we built
        logistic_regression(df)

        #Allowing user the option to download their cleaned data
        st.caption('**Note:** downloading dataset will refresh page and clear results.')
        csv = convert_df(df)
        st.download_button(
           "Download Cleaned Data",
           csv,
           "CleanedDataset.csv",
           "text/csv",
           key='download-csv'
        )
    else:
        print("Error")

Заключение

Streamlit е мощен инструмент, който улеснява бързото и лесно създаване на ML приложения. Само с няколко реда код успяхме да създадем Advanced ML модел с потребителски интерфейс. Изпробвайте го и проучете потенциала на Streamlit и машинното обучение!

Бонус информация

Изграждането на това приложение беше около 125 реда код, което за много програмисти е нищо. Ако сте достатъчно креативни с този код, можете лесно да създадете приложение, което позволява на потребителя да избира между 6 различни модела на машинно обучение, да обработва предварително вашите данни (нулеви стойности, кодиране и т.н.), да отчита мултиколинеарност и др. Препоръчвам ви да анализирате този код и да го направите свой собствен.

Както винаги се надявам, че тази статия ви е харесала и сте я намерили за информативна. Ако сте го направили, моля, помислете дали да не ръкопляскате и да следвате. Приятен ден и ще ви хвана в следващата статия.

наздраве!