Создание модели машинного обучения в Streamlit с использованием Python

Что такое стримлит?

Streamlit — это простая в использовании платформа приложений с открытым исходным кодом, подходящая для создания приложений для обработки данных. Он основан на Python и имеет простой и интуитивно понятный пользовательский интерфейс, что делает его отличным инструментом для специалистов по данным, которые хотят быстро и легко создавать мощные модели машинного обучения (ML).

В этом уроке я покажу вам, как создать модель машинного обучения с помощью Streamlit и Python. Я предполагаю, что на вашем компьютере установлены Python 3.8 и Streamlit (если нет, установите !pip install streamlit).

Время заняться кодированием!

Сначала мы импортируем в наш проект модули Streamlit, numpy, sklearn и другие. Нам нужен Streamlit для создания пользовательского интерфейса нашей модели ML, а также numpy и sklearn для построения модели ML:

import pandas as pd
import numpy as np
import streamlit as st
from sklearn.metrics import classification_report, confusion_matrix, accuracy_score
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
from sklearn.preprocessing import StandardScaler
from sklearn.feature_selection import SelectKBest
from sklearn.feature_selection import chi2
from sklearn.ensemble import ExtraTreesClassifier
import statsmodels.api as sm
import matplotlib.pyplot as plt
import seaborn as sns

Далее мы создадим приложение Streamlit. Во-первых, мы хотим дать нашему приложению название. Кроме того, мы можем настроить URL-адрес с помощью «set_page_config». Я также считаю полезным добавить часть документации в приложение, чтобы пользователь мог ознакомиться с тем, как все работает.

#Creating a title for our link/url and then also for our app view
st.set_page_config(page_title='Streamlit_ML_Test', layout='wide')
st.title('Welcome to My First ML App!')

#This is a header that expands so the user can toggle it in and out of view
with st.expander("See ML App Documentation"):
    st.subheader('Directions')
    st.write('Write your directions here')

Теперь давайте перейдем к самой модели машинного обучения. Мы собираемся построить функцию логистической регрессии, чтобы упростить запуск модели в дальнейшем. Теперь я выбрал логистическую регрессию, но ее можно заменить любой моделью машинного обучения. При этом, если вы используете другую модель, не стесняйтесь удалять набор масштабирования и проверки.

#Let's build a function to run the model
def Logistic_Regression(df):
  #Setting features
    X = df.drop(target, axis=1)

  #Defined target
    y = df[target]

  #Filling nulls with the median
    X = X.fillna(X.median())
    y = y.fillna(y.median())

  #Creating a train, test, validation split
    X_train, X_test, y_train, y_test = train_test_split(
        X, y, test_size=0.2)
    X_train, X_val, y_train, y_val = train_test_split(
        X_train, y_train, test_size=0.25)
    
  #Scaling the data
    scaler = StandardScaler()
    scaled_X_train = scaler.fit_transform(X_train)
    scaled_X_val = scaler.transform(X_val)

  #Creating our logistic regression model and fitting it
    logreg = LogisticRegression()
    logreg.fit(scaled_X_train, y_train)

  #Predicting our target
    y_pred = logreg.predict(scaled_X_val)

  #Creating our confusion matrix and classification report
    cm = (confusion_matrix(y_val, y_pred))
    cr = (classification_report(y_val, y_pred, output_dict=True))
    cr = pd.DataFrame(cr).transpose()

  #Writing our Confusion Matrix and Classification Report to view in the app
    st.write(cm)
    st.write(cr)

  #Bringing the accuracy score to view
    st.info(round(accuracy_score(y_val, y_pred, normalize=True), 2))

Затем давайте создадим область, в которую мы можем загрузить наши файлы .csv или .xlsx, которые затем станут нашим фреймом данных.

#Encoding our dataset to UTF-8
@st.experimental_memo
def convert_df(df):
   return df.to_csv(index=False).encode('utf-8')

#Uploading our file and bringing it to view
with st.sidebar.header('Upload your CSV'):
    uploaded_file = st.file_uploader(
        "Upload spreadsheet", type=["csv", "xlsx"])
    # Check if file was uploaded
    if uploaded_file:
        # Check MIME type of the uploaded file
        if uploaded_file.type == "text/csv":
            df = pd.read_csv(uploaded_file)
        else:
            df = pd.read_excel(uploaded_file)

#Formatting the dataset view and preventing an error message that automatically
#Will populate if the user doesn't submit a file
st.subheader('Dataset')
if uploaded_file:
    st.write('Here is the dataset')
    st.write(df)
else:
    st.info('Please Upload Dataset')

Теперь, когда мы обучили нашу модель, мы можем обновить наше приложение Streamlit, чтобы придать ему живой интерфейс. Мы добавим кнопку для создания прогнозов, текстовую область для ввода признаков и текстовую область для отображения результатов прогнозирования:

#Once the file is uploaded streamlit form will begin
if uploaded_file:
    #Putting the user input in the sidebar
    with st.sidebar.form(key="eve"):
        with st.sidebar:
            #User selects unique id from list of column headers
            st.sidebar.header('Select Unique Identifier')
            unique_id = st.selectbox("Please select a unique identifier (Ex:Id, Account Name, Order Number)",df.columns)

            #User can select columns to drop
            st.sidebar.header('Drop Columns if needed')
            st.write("If you do not need to drop any fields then leave this box unchecked. **Note:** Dates/Date times need to be removed from the dataset before running.")
            to_drop = st.multiselect('Drop Columns', df.columns)
            st.sidebar.caption('Please select any columns that you would like to drop from the dataset (*Note:* The unique identifier you previously selected will be dropped automatically).')

            #Then user picks the target that they want to predict
            st.sidebar.header('Define Target Variable')
            st.sidebar.write('Please select the column that you would like to predict from the drop down menu.')
            target = st.selectbox('Target Variable', df.columns)
            st.sidebar.caption('Your target variable will drop from the dataset once selected and then submitted.')

            #Finally hit submit and the model begins to run!
            submit = st.form_submit_button("Predict")
    
    #Once submitted this if statement will begin to process
    if submit:
        #Cleaning dataset
        u_id = df[unique_id]
        df = df.drop(unique_id, axis=1)
        df = df.drop(to_drop, axis=1)
        st.write("Here is your cleaned dataset")
        st.write(df)

        #Building model using the function we built
        logistic_regression(df)

        #Allowing user the option to download their cleaned data
        st.caption('**Note:** downloading dataset will refresh page and clear results.')
        csv = convert_df(df)
        st.download_button(
           "Download Cleaned Data",
           csv,
           "CleanedDataset.csv",
           "text/csv",
           key='download-csv'
        )
    else:
        print("Error")

Этот код должен помочь вам создать несколько продвинутую модель Streamlit ML с использованием Python. Его можно использовать в качестве отправной точки для создания еще более сложных приложений машинного обучения.

Развертывание модели с помощью GitHub

Теперь, когда у нас есть базовый код для создания нашего приложения, нам нужно развернуть его в Streamlit, чтобы мы могли его использовать. Сначала перейдите на веб-сайт streamlit, где вы хотите создать у них учетную запись. Затем создайте репозиторий GitHub для своих приложений, например User/Apps, а затем переключите репозиторий на частный. После того, как ваш репозиторий будет создан, вам нужно будет добавить файл для вашего кода Python ML_App.py. Наконец, самое простое — вернуться в Streamlit ›› Войти ›› Подключить репозиторий GitHub ›› Выберите User/Apps/ML_App.py, а затем пусть Streamlit сделает все остальное! Что касается внесения изменений, все, что вам нужно сделать, это обновить ваш .py в Git ›› отправить в основную ветку ›› обновить поток, и ваши изменения будут обновлены.

Полный код для Github

import pandas as pd
import numpy as np
import streamlit as st
from sklearn.metrics import classification_report, confusion_matrix, accuracy_score
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
from sklearn.preprocessing import StandardScaler
from sklearn.feature_selection import SelectKBest
from sklearn.feature_selection import chi2
from sklearn.ensemble import ExtraTreesClassifier
import statsmodels.api as sm
import matplotlib.pyplot as plt
import seaborn as sns

#Encoding our dataset to UTF-8
@st.experimental_memo
def convert_df(df):
   return df.to_csv(index=False).encode('utf-8')

#Creating a title for our link/url and then also for our app view
st.set_page_config(page_title='Streamlit_ML_Test', layout='wide')
st.title('Welcome to My First ML App!')

#This is a header that expands so the user can toggle it in and out of view
with st.expander("See ML App Documentation"):
    st.subheader('Directions')
    st.write('Write your directions here')
    
#Let's build a function to run the model
def Logistic_Regression(df):
  #Setting features
    X = df.drop(target, axis=1)

  #Defined target
    y = df[target]

  #Filling nulls with the median
    X = X.fillna(X.median())
    y = y.fillna(y.median())

  #Creating a train, test, validation split
    X_train, X_test, y_train, y_test = train_test_split(
        X, y, test_size=0.2)
    X_train, X_val, y_train, y_val = train_test_split(
        X_train, y_train, test_size=0.25)
    
  #Scaling the data
    scaler = StandardScaler()
    scaled_X_train = scaler.fit_transform(X_train)
    scaled_X_val = scaler.transform(X_val)

  #Creating our logistic regression model and fitting it
    logreg = LogisticRegression()
    logreg.fit(scaled_X_train, y_train)

  #Predicting our target
    y_pred = logreg.predict(scaled_X_val)

  #Creating our confusion matrix and classification report
    cm = (confusion_matrix(y_val, y_pred))
    cr = (classification_report(y_val, y_pred, output_dict=True))
    cr = pd.DataFrame(cr).transpose()

  #Writing our Confusion Matrix and Classification Report to view in the app
    st.write(cm)
    st.write(cr)

  #Bringing the accuracy score to view
    st.info(round(accuracy_score(y_val, y_pred, normalize=True), 2))

#Uploading our file and bringing it to view
with st.sidebar.header('Upload your CSV'):
    uploaded_file = st.file_uploader(
        "Upload spreadsheet", type=["csv", "xlsx"])
    # Check if file was uploaded
    if uploaded_file:
        # Check MIME type of the uploaded file
        if uploaded_file.type == "text/csv":
            df = pd.read_csv(uploaded_file)
        else:
            df = pd.read_excel(uploaded_file)

#Formatting the dataset view and preventing an error message that automatically
#Will populate if the user doesn't submit a file
st.subheader('Dataset')
if uploaded_file:
    st.write('Here is the dataset')
    st.write(df)
else:
    st.info('Please Upload Dataset')

#Once the file is uploaded streamlit form will begin
if uploaded_file:
    #Putting the user input in the sidebar
    with st.sidebar.form(key="eve"):
        with st.sidebar:
            #User selects unique id from list of column headers
            st.sidebar.header('Select Unique Identifier')
            unique_id = st.selectbox("Please select a unique identifier (Ex:Id, Account Name, Order Number)",df.columns)

            #User can select columns to drop
            st.sidebar.header('Drop Columns if needed')
            st.write("If you do not need to drop any fields then leave this box unchecked. **Note:** Dates/Date times need to be removed from the dataset before running.")
            to_drop = st.multiselect('Drop Columns', df.columns)
            st.sidebar.caption('Please select any columns that you would like to drop from the dataset (*Note:* The unique identifier you previously selected will be dropped automatically).')

            #Then user picks the target that they want to predict
            st.sidebar.header('Define Target Variable')
            st.sidebar.write('Please select the column that you would like to predict from the drop down menu.')
            target = st.selectbox('Target Variable', df.columns)
            st.sidebar.caption('Your target variable will drop from the dataset once selected and then submitted.')

            #Finally hit submit and the model begins to run!
            submit = st.form_submit_button("Predict")
    
    #Once submitted this if statement will begin to process
    if submit:
        #Cleaning dataset
        u_id = df[unique_id]
        df = df.drop(unique_id, axis=1)
        df = df.drop(to_drop, axis=1)
        st.write("Here is your cleaned dataset")
        st.write(df)

        #Building model using the function we built
        logistic_regression(df)

        #Allowing user the option to download their cleaned data
        st.caption('**Note:** downloading dataset will refresh page and clear results.')
        csv = convert_df(df)
        st.download_button(
           "Download Cleaned Data",
           csv,
           "CleanedDataset.csv",
           "text/csv",
           key='download-csv'
        )
    else:
        print("Error")

Заключение

Streamlit — это мощный инструмент, который позволяет легко и быстро создавать приложения машинного обучения. С помощью всего нескольких строк кода мы смогли создать модель Advanced ML с пользовательским интерфейсом. Попробуйте и изучите потенциал Streamlit и машинного обучения!

Бонусная информация

На создание этого приложения ушло около 125 строк кода, что для многих программистов пустяк. Если вы достаточно творчески подойдете к этому коду, вы можете легко создать приложение, которое позволит пользователю выбирать между 6 различными моделями машинного обучения, предварительно обрабатывать ваши данные (нулевые значения, кодирование и т. д.), учитывать мультиколлинеарность и многое другое. Я призываю вас разобрать этот код и сделать его своим.

Как всегда, я надеюсь, что вам понравилась эта статья и вы нашли ее информативной. Если вы это сделали, пожалуйста, подумайте о том, чтобы оставить аплодисменты и подписаться. Хорошего дня, и я поймаю вас в следующей статье.

Ваше здоровье!