I know where they didn't get their numbers. They didn't get them from the U.S. Departments of Education, Labor, or Commerce, nor from the National Science Foundation, nor from the Carnegie Foundation, nor from any other reliable source of public information of which I am aware.
- The numbers I retrieved from reliable sources indicate that Silicon Valley, a very small, but very wealthy sector, could easily hire a few thousand highly qualified Black technical staff by outbidding other sectors and thereby achieving 12 percent diversity in four or five years.
- On the other hand there is reason to question the existence of enough Black PhDs in most fields to enable the financially pressed academic sector to hire the hundreds of thousands of additional Black faculty that would be required to achieve 12 percent diversity in all colleges and universities within the next four or five years. And only the wealthiest fifteen or twenty would be able to compete with Silicon Valley (and Wall Street) for the best of the best.
As regular readers of this blog may be aware, for the last seven months I have been enrolled in certified MOOCs in "Data Science" -- the first from M.I.T. and the most recent six from Johns Hopkins University. These courses cover a collection of techniques that some call "data science", others call "statistical learning", still others call "machine learning". The impressive hotshots teaching the courses at Hopkins also present persuasive arguments for something they call "Reproducible Research". In my case, reproducible research would look something like this:
- In addition to publishing a report on this blog that begins with raw data retrieved from reliable sources, then derives results from the the raw data; I would also include a link to a "literate program" on GitHub that would look like the same report. It would include the same text, tables, and graphs. But it would also include the code I actually used to download the data from the original sources and all of the code that I used to perform the calculations, produce the tables, and generate the graphs.
One initial disadvantage of my new approach derives from the fact that I will be coding my first reports in R, the preferred language of professional statisticians. R is an extraordinarily powerful language with dazzling graphics packages but, unfortunately, it is also one of the quirkiest computer languages ever devised. Although its popularity within the data science/machine learning community is rapidly increasing, it's unlikely that it will ever become as popular as Python, a language that is justly famous for being easy to learn.
Why did I start of with R? I started with R because R was the language used in my first data science course, the MOOC I took last Spring that was offered by M.I.T., the course that blew my mind as few other courses have ever done. Enrolling in the extensive set of R-based courses offered by Hopkins was a logical follow-up.
Nevertheless I want my reproducible reports to be accessible to the largest possible readership, so in the Spring 2016 I will enroll in Udacity's nanodegree program in Machine Learning wherein Python is the language of choice. These MOOCs will not only strengthen my skills in the application of machine learning techniques, I anticipate that they will also enable me to create reproducible reports coded in Python.