wala's tech blog

Awww BATS: A Balanced Approach to Applicant Admissions

Hey there! It's been a while but I'm back with another post about and interesting problem I've had recently. I'll quickly introduce the situation and then talk about how we solved it.

The Problem

At SwampHacks, University of Florida's flagship hackathon, we as organizers need to review and select potential attendees from a list of 1000+ applicants. Our team is relatively small, approximately 30-ish members who all work around different times. Within the organization, we have smaller teams (such as the tech team, which is about 6 people) who are more synchronized. As such, it is really hard to agree on applicants to admit since we can't feasibly coordinate sitting down and reviewing applications all at once and making decisions. Furthermore, each applicant may be on a team, and we would prefer to keep teams together in admissions. So how do we make this process automated? And in that same line of thought, how can we make it fair?

Brainstorming

At SwampHacks, we hold two aspects of an application as the most important: your experience and passion for tech. We still aren't sure whether a more passionate group or a more experienced group creates a better hacker cohort. To be more nuanced, the more passionate you are the more likely you are to be also experienced. However, we like to have passion as an extra field because applicants that are younger can be very passionate but have limited experience (internships, research, projects, etc.) We judge an applicant's passion from their writing pieces in the application such as the responses to the essay questions.

Example:

Tell us about a project you have built and how you collaborated with others.

With these two quantifiers in mind, we asked organizers to rate each applicant on a score from 1 to 5 for each aspect. These answers would be saved in our database and the application would be marked as Reviewed.

The BAT Algorithm

Here I shall introduce the one, the only, BAT algorithm. It stands for Balanced Admissions Thresher! Thresher: a person or machine that separates grain from the plants.

The BAT algorithm is comprised of 3 primary parts: preprocessing, staging bucket isolation, and staged selection. We will go step by step as I explain. The following code snippets are written in Golang.

1) Preprocess

Preprocessing the applications includes assigning scores to applications, grouping teams, and other computations before moving on to the stage isolation step.

i) Weighted Scoring

For each applicant i, we calculate a weighted score comprised of a weighted sum of the scores given for passion and experience from before.

Formally, the weighted score for an applicant is defined as:

Si=wpassion·spassion+wexp·sexp+0.1

where:

wpassion and wexp are the weights assigned to passion and experience, respectively, satisfying wpassion+wexp=1.0

spassion and sexp are the scores for passion and experience, each within the interval [1,5]

ϵ=0.1 is a small constant introduced to prevent zero-valued scores (useful later).

ii) Group Applicants

For each applicant, we need to group them into two buckets: individuals or teams. If an applicant was on a team, then we try our best to keep teams together so we have a dedicated selection bucket for teams.

For teams, a teams weighted score is the average of all its members scores. A team is compared against all other teams for admissions but the specifics of the algorithm is specified in the selection section below.

2) Staging Bucket Isolation

As per our head organizers, we have a few constraints and necessary policies to consider. For example, we want to have a best effort for 70% of all acceptances to go to applicants from the University of Florida and the remaining to external schools. Furthermore, we need to make a best effort for 65% (for example) of acceptances at UF to come from what we consider early career applicants (Freshmen-Sophomore).

When we structure these constraints, we form 4 dedicated selection pools that compete with each other for a spot. Our slots include UF Early Career, UF Late Career, Other Early Career, and Other Late Career. This split is also what keeps the selection process fair. Consider that early career applicants are on average less experienced than their later career counterparts. If we were to compare the weighted averages and select from a primary pool with everyone, then later career applicants tend to dominate. Therefore, isolating each group allows for more fair competition within itself.

To handle a lack of applicants in a specific pool, allocations for final slots may be rolled over to its sister node. For example, leftover allocations to UF Early may be given to UF Late. And vice versa (for Other too)

3) Selection Phase

Finally we can begin the selection phase. To be completely fair, the selection phase and the bucket isolation phase are intertwined and the bucket isolation doesn't happen until after the team phase of the selection phase.

The Selection Algorithm

We use a modified version of the Efraimidis–Spirakis random weighted sampling. This algorithm ensures that we randomly select applicants without replacement while still ensuring the weight has a significant affect on the outcomes. This is perfect since we want everyone to have at least a chance to get accepted no matter how poor their application review (and also reduces biases introduced by human reviewers).

The algorithm is as follows:

ki=vi1/wi

where:

Then we can sort by the key and select the top n applicants/teams for that iteration.

i) Team Selections

Team selections happen first. We have a certain amount of acceptance slots allocated to team member. For example, we may allot 50 slots to teams and for teams of 4 members or less this gives us a minimum of 13 slots for teams ( 50 // 4 = 12 and 50 % 4 = 2 so total 12 teams of 4 and 1 team of 2).

We will apply the Efraimidis–Spirakis random weighted sampling algorithm specified above on the team's averaged score and iterate through the teams and filling accepted slots until we run out of team spot quotas.

Afterwards, every non selected team's members will be moved to the individual applicant selection phase to give them a chance of making it in on individual prowess (and luck!).

ii) Individual Selection

Finally, we will loop twice over our buckets created from before (in order to cover prorogation both ways back and forth from sister nodes to each other). For each loop we will calculate and individuals sort key using the random sampling algorithm and then select the top n applicants where n is the quota for that bucket.

Conclusion

After all that mayhem, we were able to select 500 applicants for SwampHacks XI out of 1000+ applications with ease! We know this algorithm and process is not yet perfect and there are many ways to expand on its options such as adding an arbitrary amount of quantifiers for scoring and a dynamic bucket creation with rulesets. However, this current version allows us to fairly accept applicants while giving everyone, no matter their score, a chance of getting in.

Thanks, Wala out!

Source(s)

[1] https://www.sciencedirect.com/science/article/abs/pii/S002001900500298X
[2] BAT Engine