How I Used Machine Learning To Help A Homeless Shelter

From project planning, architectural design, my technical challenges, and the machine learning model’s results

Trevor Pedersen
7 min readFeb 4, 2021

About The Shelter

Family Promise is a non-profit organization that provides shelter to homeless families. The work they do is of tremendous value to the community.

In 2019, the shelter provided services to more than 110,000 men, women, and children. They housed nearly 20,000 people — most of whom were children. Since the organization’s inception they’ve housed more than a million people.

With the help of the organization’s services, most of these families return to stable housing and get out of poverty.

In order to achieve higher efficiency in helping families in need, the organization looked for ways to better optimize their day-to-day business operations and wanted insights on how to better transition their tenants into permanent housing.

To turn wish into reality, the organization looked to me and other Lambda School Alumni to design a web application that fulfills these needs.

Planning The Project

Our initial process of planning the project involved interviewing the stakeholders of the organization. We wanted to specifically know what kind of business problems they wanted to solve, and how their users would be interfacing with our software.

From the stakeholder meetings, we identified several core business needs:

  • They would like to move from a paper intake system into a digital intake system. There are over 30 pages of forms guests have to fill out when they reach the shelter, and this would significantly reduce the time spent filling out forms. Additionally, it’d allow managers/supervisors the ability to digitally access relevant data on guests — as opposed to pulling files by hand.
  • They would like a dashboard that supplies managers/supervisors with statistics and data visualizations about the shelter and guest cases.
  • They would like a model that helps them assess the most likely outcome of arriving guests, whether the guest will transition to permanent housing, emergency shelters, transitional housing, etc..
  • Additionally, the model needs to be interpretable to case managers. Case managers will be diverting extra resources to guests who are most at risk, so the case manager needs context from the model to make these decisions.

The Framework

In order to meet the stakeholder’s needs, we would need a well-planned framework. For our project, we’ve chosen a framework that focuses on rapid deployment and scalability due the limited time frame we have to develop the project.

For our Frontend, we’ve decided to use React, which allows us to rapidly build highly dynamic user interfaces due to its virtual DOM environment. The virtual DOM lets us constantly update the UI without having to worry about performance issues. Additionally, we are able to use React components which allow us to reuse code, which saves significantly on programming time.

For our Backend we decided upon Node.js, again, because it allows for rapid deployment and high scalability. It can easily scale up to a million users, so the shelter will likely never have to worry about upgrading the backend framework. Additionally, it has the benefit of making both our Frontend and Backend basically full JavaScript environments, so both systems can easily communicate between each other.

Then, finally, we used Fast API as our Data Science microservice. We decided on this because it allows the Data Science team to work in Python, the de facto language for Data Scientists, which lets us seamlessly integrate our machine learning models and visualizations into the API.

The microservice also allows us to work independently of the other teams without having to worry about integrating our code with theirs. It also isolates the more computationally heavy calculations on the microservice instead of running them on the Node server, which is beneficial. If those calculations were run directly on the Node server it could end up being potentially problematic because of performance issues.

Technical Challenges

As a member of the Data Science team, one of our core responsibilities was providing visualizations to the Front End team. However, sending visualizations from a Python-based Data Science microservice to JavaScript rendered front end was a little more complicated than originally thought. The JavaScript endpoints required JSON representation of images, which were having issues being re-rendered back into an image on the front end.

Our solution to this was to refactor our Pyplot generated graphs to Plotly generated graphs. Plotly was specifically designed for supplying visualizations to the front end and had no issues running in a JavaScript environment.

Another technical challenge we ran into was our unbalanced dataset. The problem is that the stakeholders want to be able to predict who is most at risk to continue being homeless, but the majority of guests successfully transition back to permanent housing. This causes issues with how the machine learning model interprets results. Basically, the model will be way more likely to just label everyone as to transition to permanent housing — and this would give us fantastic prediction accuracy, but it isn’t very useful to the stakeholders as a model. The stakeholders are more interested in knowing who is likely to continue to be homeless, so that they may divert more resources and attention to them to prevent that from happening.

If the model fails to predict that a guest will end up being homeless, that can be catastrophic for that individual, so it is of utmost importance to get that right.

We can fix the imbalance by undersampling from the minority class. Basically, we want our dataset to be around 40–50% of the minority class even though they may only make up 10–20% of the dataset or less. We can achieve this by deleting some of the data from the majority class.

After that if you still have an imbalance you can perform oversampling. Typically this is done by using SMOTE (Synthetic Minority Over-sampling Technique), which, in simple terms, basically fills the dataset with artificially created data points from the minority class.

After completing the above steps, you’ll have a balanced dataset between the majority and minority classes. This dramatically changes the way a machine learning model will interpret the data. Importantly, it’ll be significantly more likely to find correlations that lead to better predictions on the minority class.

Creating custom visualizations from SHAP (SHapley Additive exPlanations) values turned out to be quite challenging. SHAP is a state-of-the-art algorithm that reverse-engineers the output of a predictive algorithm by applying a form of game theory on the model to quantify results. SHAP can end up being a magical black-box algorithm that gives you some pretty cryptic results.

The above image seems like nothing but random chaos, but it is actually a giant array of SHAP values that explain our model’s predictions. However, unless you understand the deep inner workings of the algorithm, it is hard to use these values to generate custom visualizations. Most online resources will only give you straight out-of-the-box visualizations that might not necessarily be helpful to your stakeholders. After a considerable amount of research, I was able to interpret the SHAP values correctly and turn them into visualizations with code below.

Model Results

Our baseline predictive accuracy was around 34% and after applying various feature engineering, data rebalancing, and creating a CatBoost model, we were able to achieve an accuracy of 76%. Nearly double the baseline. This accuracy was balanced between all prediction classes.

Through the use of SHAP plots, we were able to generate graphs that provide detailed explanations on why our CatBoots model was providing the predictions they were.

These visualizations are incredibly helpful to case managers because they give a general oversight on what causes a guest to temporarily or permanently exit their shelter. From the graph, we can see that being in the shelter longer does help guests transfer to permanent housing. Also, that guests who are constantly checking into the shelter might be at greater risk to stay in the shelter. Finally, though it may be difficult to tell at first glance it does seem like guests who receive some kind of government assistance (State Funded, Social Security, Medicaid) are more likely to make it to permanent or transitional housing.

Above, we have another graph that is supposed to assist case managers in making informed decisions about a particular guest. This graph basically details why our CatBoost model made the decisions it did. According to the model, it considers the few number of days enrolled in the project will lead to transitional housing. While their age of 38 opposes transitional housing.

In Summary

I think the project turned out to be a great success. The model was able to churn out fairly accurate predictions on a relatively limited and unbalanced dataset. We were also able to create simple-to-understand visualizations to assist case managers in understanding their guests’ risk levels. With this understanding, high-risk guests should receive more resources which, in turn, will reduce the likelihood of them becoming homeless in the future.

Reflections

I’m really grateful that I got to work with Family Promise. It isn’t very often you get to create software that can positively impact so many people’s lives — and I appreciate the opportunity. I also got to work with a lot of amazing people who taught me a lot about working in a team.

--

--