Awww BATS: A Balanced Approach to Applicant Admissions
Hey there! It's been a while but I'm back with another post about and interesting problem I've had recently. I'll quickly introduce the situation and then talk about how we solved it.
The Problem
At SwampHacks, University of Florida's flagship hackathon, we as organizers need to review and select potential attendees from a list of 1000+ applicants. Our team is relatively small, approximately 30-ish members who all work around different times. Within the organization, we have smaller teams (such as the tech team, which is about 6 people) who are more synchronized. As such, it is really hard to agree on applicants to admit since we can't feasibly coordinate sitting down and reviewing applications all at once and making decisions. Furthermore, each applicant may be on a team, and we would prefer to keep teams together in admissions. So how do we make this process automated? And in that same line of thought, how can we make it fair?
Brainstorming
At SwampHacks, we hold two aspects of an application as the most important: your experience and passion for tech. We still aren't sure whether a more passionate group or a more experienced group creates a better hacker cohort. To be more nuanced, the more passionate you are the more likely you are to be also experienced. However, we like to have passion as an extra field because applicants that are younger can be very passionate but have limited experience (internships, research, projects, etc.) We judge an applicant's passion from their writing pieces in the application such as the responses to the essay questions.
Example:
Tell us about a project you have built and how you collaborated with others.
With these two quantifiers in mind, we asked organizers to rate each applicant on a score from 1 to 5 for each aspect. These answers would be saved in our database and the application would be marked as Reviewed.
The BAT Algorithm
Here I shall introduce the one, the only, BAT algorithm. It stands for Balanced Admissions Thresher! Thresher: a person or machine that separates grain from the plants.
The BAT algorithm is comprised of 3 primary parts: preprocessing, staging bucket isolation, and staged selection. We will go step by step as I explain. The following code snippets are written in Golang.
1) Preprocess
Preprocessing the applications includes assigning scores to applications, grouping teams, and other computations before moving on to the stage isolation step.
i) Weighted Scoring
For each applicant , we calculate a weighted score comprised of a weighted sum of the scores given for passion and experience from before.
Formally, the weighted score for an applicant is defined as:
where:
and are the weights assigned to passion and experience, respectively, satisfying
and are the scores for passion and experience, each within the interval
is a small constant introduced to prevent zero-valued scores (useful later).
ii) Group Applicants
For each applicant, we need to group them into two buckets: individuals or teams. If an applicant was on a team, then we try our best to keep teams together so we have a dedicated selection bucket for teams.
For teams, a teams weighted score is the average of all its members scores. A team is compared against all other teams for admissions but the specifics of the algorithm is specified in the selection section below.
2) Staging Bucket Isolation
As per our head organizers, we have a few constraints and necessary policies to consider. For example, we want to have a best effort for 70% of all acceptances to go to applicants from the University of Florida and the remaining to external schools. Furthermore, we need to make a best effort for 65% (for example) of acceptances at UF to come from what we consider early career applicants (Freshmen-Sophomore).
When we structure these constraints, we form 4 dedicated selection pools that compete with each other for a spot. Our slots include UF Early Career, UF Late Career, Other Early Career, and Other Late Career. This split is also what keeps the selection process fair. Consider that early career applicants are on average less experienced than their later career counterparts. If we were to compare the weighted averages and select from a primary pool with everyone, then later career applicants tend to dominate. Therefore, isolating each group allows for more fair competition within itself.
To handle a lack of applicants in a specific pool, allocations for final slots may be rolled over to its sister node. For example, leftover allocations to UF Early may be given to UF Late. And vice versa (for Other too)
3) Selection Phase
Finally we can begin the selection phase. To be completely fair, the selection phase and the bucket isolation phase are intertwined and the bucket isolation doesn't happen until after the team phase of the selection phase.
The Selection Algorithm
We use a modified version of the EfraimidisāSpirakis random weighted sampling. This algorithm ensures that we randomly select applicants without replacement while still ensuring the weight has a significant affect on the outcomes. This is perfect since we want everyone to have at least a chance to get accepted no matter how poor their application review (and also reduces biases introduced by human reviewers).
The algorithm is as follows:
where:
- is a uniform random number drawn for applicant
- is the weighted score of applicant or team
- is the key used to rank applicants/teams for weighted sampling without replacement
Then we can sort by the key and select the top applicants/teams for that iteration.
i) Team Selections
Team selections happen first. We have a certain amount of acceptance slots allocated to team member. For example, we may allot 50 slots to teams and for teams of 4 members or less this gives us a minimum of 13 slots for teams ( 50 // 4 = 12 and 50 % 4 = 2 so total 12 teams of 4 and 1 team of 2).
We will apply the EfraimidisāSpirakis random weighted sampling algorithm specified above on the team's averaged score and iterate through the teams and filling accepted slots until we run out of team spot quotas.
Afterwards, every non selected team's members will be moved to the individual applicant selection phase to give them a chance of making it in on individual prowess (and luck!).
ii) Individual Selection
Finally, we will loop twice over our buckets created from before (in order to cover prorogation both ways back and forth from sister nodes to each other). For each loop we will calculate and individuals sort key using the random sampling algorithm and then select the top applicants where is the quota for that bucket.
Conclusion
After all that mayhem, we were able to select 500 applicants for SwampHacks XI out of 1000+ applications with ease! We know this algorithm and process is not yet perfect and there are many ways to expand on its options such as adding an arbitrary amount of quantifiers for scoring and a dynamic bucket creation with rulesets. However, this current version allows us to fairly accept applicants while giving everyone, no matter their score, a chance of getting in.
Thanks, Wala out!
Source(s)
[1] https://www.sciencedirect.com/science/article/abs/pii/S002001900500298X
[2] BAT Engine