6/28/2023 0 Comments Python writing program![]() The user can then click the "Process Data" button to run the script and preprocess the data. ![]() It uses the Streamlit library to create an interactive web app that allows the user to input the file path, test/train split size, and threshold for the number of missing values per record. This version is a Streamlit app that allows the user to provide the same arguments as command-line arguments. St.success("Data preprocessing completed!") Train_df, test_df = split_data(df, test_size) Threshold = st.number_input("Enter the threshold for the number of missing values per record: ", step=1, value=1)ĭf, conversions = load_and_convert_data(file_path)ĭf = handle_missing_values(df, threshold) Test_size = st.number_input("Enter the train/test split size (decimal between 0 and 1): ", step=0.01, value=0.2) St.set_page_config(page_title="Data Preprocessing", page_icon=":guardsman:", layout="wide")įile_path = st.text_input("Enter the path/name of the dataset csv file: ") Return train_test_split(df, test_size=test_size) # Drop records with more than threshold missing valueĭf.dropna(thresh=len(df.columns) - threshold, inplace=True) # Impute missing values for records with one missing valueįor col in missing_values.index:ĭf.fillna(df.median(), inplace=True) # Convert string values to numeric and track conversions in dictionaryĬonversions = ĭef handle_missing_values(df, threshold): # Initialize dictionary to track string to numeric conversions
0 Comments
Leave a Reply. |