On Medici and Thiel
I think the single most misdirected bit of philanthropy in this decade is Peter Thiel's special program to bribe people to drop out of college. ~ Larry Summers
So begins
’s “On Medici and Thiel“, a formative piece of prose that helped clarify my thinking about youth patronage generally and the Thiel Fellowship in particular. The terms of the fellowship are relatively modest:$100k per fellow, 20 annual recipients
And yet, the “returns” for such billionaire modesty are outsized, to say the least, as the following are just a sampling of the recipients1:
Vitalik Buterin, founded Ethereum
Austin Russell, Luminar CEO
Kaushik Tiwari, Better Financial CEO
Dylan Field, founded Figma
Ritesh Agarwal, founded Oyo
Alex Rodriguez, Embark CEO
The total investment to date has been even more modest:
“It is also true that it's such an absurdly small amount of money that Thiel has spent. Altogether, for the 217 participants, Thiel shelled out around $22m. Add another 10-15% for administrative expenses, call it an even $25m, maybe more for kombucha and massages so $30m. To compare this, he individually contributed $20m just recently to the Senate races in Arizona.”
We also have Emergent Ventures, which I’ve written about before and though not exclusively targeting youth does yield a solid proportion of youthful recipients. These grants are far smaller on average than Thiel’s, with many grants in the $10k range, but as Emergent’s founder Tyler Cowen has previously opined, the value is less monetary than support and belief in the recipient.
“Supplying people, especially younger people, visions of what they could be, is greatly undersupplied…In some grants I’ve given out through Emergent Ventures to younger people, I’ve also tried to give them a sense of what I think they could be and I suspect that’s more important in some cases than the grant. In a way it’s complemented by the grant…The grant makes the vision more vivid or more focal, like they believe the vision because you spent real dollars on them.
There are more granting programs than these two, of course, but they are the most known and prestigious that touch our tech-focused youth cohorts today2. Rohit provides us with a handy 2x2 matrix in which he roughly organizes the two primary forms of capital in question here:
“You could quibble with the placement of various orgs on that chart, but the key problem is that the top right quadrant is frustratingly empty! We need a Givewell for top talent.”
He then offers a challenge:
“What we need is the YC of talent identification and encouragement. We either need Thiel to step up and 100x his fellowship, or a 100 other billionaires to step up and stop whining about talent.”
Why We Stopped Making Einsteins
One supposition of a 100x increase in Thiel-like fellowships is that there is sufficient supply to accept the increase in awards. I’m not quite sure how we’d establish this baseline, but I’m a bit concerned that we might quickly hit the upper edge of the genius-adjacent recipient pool. What we should strive for is a path to ensure, or at least increase the probability, that we can develop as many three-sigma performers as our species can muster. We shouldn’t leave such things to chance!
Early genius occasionally appears to spring from the ether — here I’m thinking of the Ramanujan’s of the world — but far more often our historical geniuses were not, in fact, self-taught; nor were they “normally” schooled. An impressive list were instead brought up with what
calls “aristocratic tutoring”. The list of such tutoring recipients is impressive (of course):Marcus Aurelius
Bertrand Russell
John von Neumann
Charles Darwin
Voltaire
John Stuart Mill
Karl Marx
Hannah Arendt
Hoel comments in “Why We Stopped Making Einsteins”:
“With these examples in mind, it’s likely that at a significant contributing factor for the phenomenon of genius running in families is that genius family members act as aristocratic tutors, encouraging learning, the life of the mind, and inculcating the pursuit of the higher mysteries in the young.”
Building further on this in “How Geniuses Used to Be Raised“, Hoel quotes Richard Feynman:
“I think, however, that there isn’t any solution to this problem of education other than to realize that the best teaching can be done only when there is a direct individual relationship between a student and a good teacher—a situation in which the student discusses the ideas, thinks about the things, and talks about the things. It’s impossible to learn very much by simply sitting in a lecture, or even by simply doing problems that are assigned. But in our modern times we have so many students to teach that we have to try to find some substitute for the ideal.”
Note that aristocratic tutoring does not look like our present view of tutors. Hoel again, quoting from John Stuart Mill’s autobiography:
“this also my father taught me: it was the task of the evenings, and I well remember its disagreeableness. But the lessons were only a part of the daily instruction I received. Much of it consisted in the books I read by myself, and my father’s discourses to me, chiefly during our walks. . . I made notes on slips of paper while reading, and from these in the morning walks, I told the story to him. . .”
Hoel calls out the stark contrast:
“Think of the difference between this and modern tutoring, wherein one meets an older student at a coffeeshop to grind SAT problems. The aristocratic method is unhurried and less structured, sometimes even conducted best, it seems, on walks.”
Now, one obvious objection to this line of thinking — that genius is not solely born — is the lack of counterfactuals. Yes, Russell and Mill were rigorously tutored from a young age, but we can’t know for certain that they would not have arrived at a similar place in absence of such educational attention. While we can’t retroactively unpack these alternate genius histories, we do have sufficient evidence that tutoring has a definitionally outsized impact on performance. Hoel once again, in “Why We Stopped Making Einsteins“:
“Tutoring, one-on-one instruction, dramatically improves student’s abilities and scores. In education research this effect is sometimes called “Bloom’s 2-sigma problem” because in the 1980s the researcher Benjamin Bloom found that tutored students
“. . performed two standard deviations better than students who learn via conventional instructional methods—that is, "the average tutored student was above 98% of the students in the control class.””
Even if one agrees that “aristocratic tutoring” is the preferred method for both honing intellectual capability and maximizing humanity’s “genius supply”, we still must deal with the first half of this method — the “aristocratic” piece. It’s all well and good that so many of our historical geniuses came from wealth and saw that wealth applied against a robust tutor-led education that far outstripped traditional eduction, but should we be satisfied continuing this approach? Should wealthy children be the sole beneficiaries of this approach? Or inverted, should humanity discard such methods for the 99% of our global population who can’t, of their own means, afford this level of education?
Rethinking Education
Thiel took a ton of heat for the Fellowship’s approach before it was fashionable. As Summers’ quote at the outset of this piece attests, many MANY found the very idea of incentivizing the skipping of a traditional college education repugnant at best. Since the launch, public (and especially youth) views on the value of college education have dipped substantially; not quite to Thiel’s level of “college isn’t useful”, but trending more in this direction.
Such heterodox views shouldn’t in fact be so controversial, as we have for years promulgated the false narrative that our educational system is an unquestionable good, responsible for so much of the innovation-led progress we’ve enjoyed the past 100 years.
in “Education Doesn’t Work 2.0” cites research demonstrating that:“Over the last 50 years in developed countries, evidence has accumulated that only about 10% of school achievement can be attributed to schools and teachers while the remaining 90% is due to characteristics associated with students. Teachers account for from 1% to 7% of total variance at every level of education. For students, intelligence accounts for much of the 90% of variance associated with learning gains.”
Now, his framing of these stats concerns the inability of our education systems (current or historical) to reliably move students between different educational bands:
“The brute reality is that most kids slot themselves into academic ability bands early in life and stay there throughout schooling. We have a certain natural level of performance, gravitate towards it early on, and are likely to remain in that band relative to peers until our education ends.”
My focus here is not exactly overlapped with his interest. It’s not that I’m uninterested in the inequity of education; my goal here is geared more toward the absolute returns of improved education — especially with regard to optimally elevating the potential of our genius-adjacent class, who would ultimately receive our grants once they “came of as age”, as it were.
That is, de Boer is focused on the entirety of the educational distribution; I’m specifically focused here on the two-sigma and above piece of this likely normal-ish curve3. Because statistics are quite often unintuitive, and given the cries of elitism that inevitably spring up when discussing such things, I think it’s important to call out just how large a population we’re talking about here.
Given a normal curve, two-sigma (standard deviation) entails ~2.5% of a given population. With approximately 2.5B children under the age of 18 globally, a two-sigma population entails 62.5M children who might, with additional focused effort, achieve globally-relevant impact. For reference, this is almost equivalent to the population of the UK, and 50% larger than California’s population.
The impact here could be potentially enormous, if we figured out a better way to harness such innate talent. Our standard educational systems are simply not the answer, as each school represents
“a competitive academic meritocracy wrapped in an obtuse hierarchical bureaucracy, a structure in which they will spend most of their young adult life, forced to learn mostly from their peers, who know as little as they do.”
But it’s not just the bureaucratic weighting, but the ineffectual nature of the institution:
“Due to skyrocketing costs, a top private high school costs around $70,000 a year, and yet statistics struggle to find any advantage in outcome from sending a student there”
This is not just about high tuitions. It’s about the ineffectiveness of seemingly every intervention meant to fundamentally bend the learning curves (forgive the long quote):
“Winning a lottery to attend a supposedly better school in Chicago makes no difference for educational outcomes. In New York? Makes no difference. What determines college completion rates, high school quality? No, that makes no difference; what matters is “pre-entry ability.” How about private vs. public schools? Corrected for underlying demographic differences, it makes no difference. (Private school voucher programs have tended to yield disastrous research results.) Parents in many cities are obsessive about getting their kids into competitive exam high schools, but when you adjust for differences in ability, attending them makes no difference. The kids who just missed the cut score and the kids who just beat it have very similar underlying ability and so it should not surprise us in the least that they have very similar outcomes, despite going to very different schools.”
Let’s for argument’s sake say that our goal is to offer 1:1 aristocratic tutoring from ages 5-17 for this two-sigma population of 62.5M globally. No, that’s too large an initial bite. Let’s restrict to just the US, so that we avoid most of the language and cultural challenges inherent to anything global. Using estimates of ~330M US citizens, 22% of whom are minors, our two-sigma US population is ~1.8M children (about the size of West Virginia).
How might we (un)reasonably attempt to build an “aristocratic tutoring” program to carry this two-sigma population from age 5-17?
Scaling Aristocratic Tutoring
I was fortunate as a child to participate in a school for gifted children. Once a month, from 2nd through 8th grade, we were bussed from our traditional public school to a separate location, where we the children selected our very rough areas of interest (for me, 90% math-related) and, rather than the typical lecture format to which we were all accustomed, would instead engage in loosely guided “intellectual play”. Really this was lightly structured “curiosity pursuance”, which I would later come to learn was modeled loosely off of the Montessori tradition.
These scant few days each year were the best learning experiences I’ve ever had, even as a relentless, self-driven learner. Quick postscript - to my complete non-surprise, this specialty school was shut down a number of years after I’d aged out, as it dared to violate the unfortunate institutional virtues of (false) egalitarianism and equity. Compared to the standard (good quality) public education system that comprised the bulk of my schooling hours, this specialized learning environment was unsurprisingly far more influential on my current thinking.
Given that these two-sigma kids are likely self-motivated, we should expect the majority of their education time will be spent in solo study, with periodic interludes for engagement with their tutors. If we estimate 35 hours per week4 for dedicated education time, let’s conservatively estimate that 60% of this time (21 hours) is solo learning, with the remaining 14 hours earmarked for tutelage.
Let’s also assume that each student is engaged in five domains per week (in an ever-evolving basis). Each tutor then spends 2.8 hours per week with each pupil. Let’s further assume that prep time for these sessions represents another 50% of time per pupil per week, for 3.7 hours total. Given a 45-hour work week, each tutor can hypothetically take on 12 students at a time.
We also need to account for the fact that within a given domain like “science”, a given tutor — even a polymath — will not be ideally suited for all sub-domains, and therefore we will need multiple tutors per discipline for each student’s educational life cycle. Let’s say that for the 14 years of typical schooling age, from 5 to 18, each student requires four different tutors. This brings us to 20 total tutors per student.
Given these very rough assumptions, we’re in need of well over 100,000 tutors to maximally educate our two-sigma US student population. There are just shy of 4 million full- and part-time teachers in the US, so our estimate here requires less than 5% of this total population.
But wait, no, that isn’t correct. I’ve made a fundamental mistake with one of our assumptions, which is not surprising given my own educational training. The aristocratic tutoring model is not composed of “full-time tutors” — almost by definition, these tutors are full-time practitioners, and most certainly opsimaths. Each of these tutors is thus likely to spend only a portion of her time each week in tutelage, which significantly cuts down the number of concurrent pupils to say 1-3.
We have a further as-of-yet undiscussed restraint which is geography. Historically all aristocratic tutelage understandably occurred locally, in-person. Despite significant technological advancements in both online teaching environments specifically and digital communications more broadly (both of which we’ll return to shortly), the ideal teaching environment is and will continue to be in the flesh. It’s very difficult to go on the aforementioned “learning walks” if our pair cannot walk together!
Suffice it to say that these additional constraints — leveraging the fractional time of seasoned professionals across each domain and in-person tutoring — push our necessary supply of tutors far far higher, likely into the million plus range. This piece is already long enough that a bottoms-up deep dive into potentially available talent per field is unwise, but let’s all just agree that at present these constraints provide an untenable path for achieving our goals.
Thus we return as always to technology. A huge host of researchers last year launched a study investigating the impact of an AI tutor against learning through a typical massive open online course (MOOC) and found “average normalized learning gains” 2.5x higher than students learning through the MOOC. This is even more promising when we acknowledge that MOOCs themselves have been shown to outperform traditional schooling environments. So the advantage of current-state AI tutors over the traditional schooling that most of our two-sigma students receive today should be more on the order of magnitude of 3-5x. These are incredible results!
Let’s further acknowledge that these systems are in the very early stages of development, and with further refinement should (in theory) far surpass current results. As we’ve found with sophisticated AI systems in chess and Go, human+AI has consistently outperformed either component individually, and we should probably feel safe assuming that this condition will also manifest in our case. That is, a great tutor coupled with an incredible AI should yield optimal returns for each of our two-sigmas.
Closing Thoughts
This piece (and this series) was allegedly meant to focus on the grant-making apparatus to empower our most talented youth. My intention was to discuss how we might massively scale up Thiel or Emergent-like grants. Frankly, I think this is a pretty straightforward endeavor that just requires greater dedicated capital. My deviations upstream of these grants represent my relative unease with the total supply of generational thinkers and practitioners. It’s one thing to decide to offer funding to, say, 10k of our most talented minds every year; it’s another thing entirely if we don’t actually have this number available for such funding.
I focused on the two-sigma population because it’s most closely associated with grant recipients, but our educational failures quite obviously extend across the entire distribution. It’s imperative that we fundamentally evolve beyond the current sclerotic, dogmatic, and obviously sub-optimal educational institutions; technology will certainly help here, but in my eyes this is an insufficient salve. We need new systems, not just new technologies, that can meet the demands our now 8 billion-person planet requires to maintain innovation velocity.
For example, the Fulbright Program, which is more of a scholarship-adjacent, globally-focused enrichment program for (primarily) current college students. Or less formalized and rarely publicized grants from the likes of 1517 Ventures, the founders of which non-coincidentally created the Thiel Fellowship before spinning off on their own.
Earlier in the piece I referenced three-sigma genius. Why then push upstream in the distribution? Ultimately this is just a funneling decision. I’m not totally clear that at a super young age we can detect three-sigma performers, so we’re widening the funnel to the two-sigma level to (hopefully) capture a much higher proportion of total three-sigma achievers.
The standard school day today is something like 8am to 3pm, or 7 hours. Actual learning time is significantly less than this, but modeling our time spent off of this “standard” model that also happens to be just below a “standard” work week feels moderately correct.