Summary of the Global AI and Data Science Education Landscape

There have been a number of efforts led by professional societies, industry groups, and foundations to address AI and Data Science training and curriculum development at different levels. Section 2.1 provides a summary of Data Science knowledge and skills and curriculum guidelines for undergraduate programs. In Section 2.2, we describe some recent efforts at defining an AI curriculum.
2.1 Data Science
While there is no unique definition, there is consensus that Data Science is a new discipline, an interdisciplinary field that leverages principles and tools from computer science, mathematics, and statistics to draw insights from data. The Royal Society in its report, Dynamics of Data Science Skills, notes that “there is a wide variety of skills under the label ‘data science’ and people with relevant skills may associate with other disciplines [13].” In recent years, we have seen the emergence of new programs at the interface of the computational sciences and specific application domains, including Biomedical Data Science and Social Data Science.
2.1.1 Association for Computing Machinery (ACM): Computing Competencies for Undergraduate Data Science Curricula, ACM Data Science Task Force (2021)
The ACM Report recognizes Data Science as an “inherently interdisciplinary field” that brings together “domain data, computer science, and the statistical tools for interrogating the data and extracting useful information [14].” The report defines the following computing-focused Knowledge Areas (KAs) for Data Science:
1. Analysis and Presentation (AP)
2. Artificial Intelligence (AI)
3. Big Data Systems (BDS)
4. Computing and Computer Fundamentals (CCF)
5. Data Acquisition, Management, and Governance (DG)
6. Data Mining (DM)
7. Data Privacy, Security, Integrity, and Analysis for Security (DP)
8. Machine Learning (ML)
9. Professionalism (PR)
10. Programming, Data Structures, and Algorithms (PDA)
11. Software Development and Maintenance (SDM)
The report goes on to add that “the above KAs need to be augmented with competencies in calculus, discrete structures, probability theory, elementary statistics, advanced topics in statistics, and linear algebra, among others. A complete curriculum would also include at least one domain context for application of data science concepts and methods.” The table below, excerpted from the report, provides a breakdown of the KAs in terms of sub-domains.
Table 1 : Computing Data Science Knowledge Areas (with sub-domains)
Analysis and Presentation
» Foundational considerations
» Visualization
» User-centered design
» Interaction design
» Interface design and development
Artificial Intelligence
» General
» Knowledge representation and reasoning - logic based.
» Knowledge representation and reasoning - probability based
» Planning and search strategics
Big Data Systems
» Problems of scale
» Big data computing architectures
» Parallel computing frameworks
» Distributed data storage
» Parallel programming
» Techniques for Big Data applications
» Cloud computing
» Complexity theory
» Software support for Big Data applications
Computing and Computer Fundamentals
» Basic computer architecture
» Storage systems fundamentals
» Operating system basics
» File systems
» Networks
» The web and web programming
» Compilers and interpreters
Data Acquisition, Management, and Governance
» Data acquisition
» Information extraction
» Working with various types of data
» Data integration
» Data reduction and compression
» Data transformation
» Data cleaning
» Data privacy and security
Data Mining
» Proximity measurement
» Data preparation
» Information extraction
» Cluster analysis
» Classification and regression
» Pattern mining
» Outlier detection
» Time series data
» Mining web data
» Information retrieval
Data Privacy, Security, Integrity,
and Analysis is for Security
» Data privacy
» Data security
» Data integrity
» Analysis for security
Machine learning
» General
» Supervised learning
» Unsupervised learning
» Mixed methods
» Deep learning
Professionalism
» Continuing professional development
» Communication
» Team Work
» Economic considerations
» Privacy and confidentialily
» Ethical considerations
» Legal considerations
» Intellectual property
» On automation
Programming, data structures and algorithms
» Algorithmic thinking and problem solving
» Programming
» Data structures
» Algorithms
» Basic complexity analysis
» Numerical computing
Software development and maintenance
» Software design and development
» Software testing
The ACM report is comprehensive, providing a detailed description of topics within each of the sub-domains and highlighting the associated skills that students will acquire through well-defined learning outcomes. The report also addresses some of the challenges for institutions developing new programs, including recruitment of faculty and access to resources.
2.1.2 Park City Report, U.S. (2017)
The Park City Math Institute (PCMI) convened a group of computer science, mathematics, and statistics faculty to develop curriculum guidelines for undergraduate Data Science programs [15]. The report emphasizes the critical role of data, noting that the “recursive data cycle of obtaining, wrangling, curating, managing and processing data, exploring data, defining questions, performing analyses, and communicating the results lies at the core of the data science experience.”
The report identifies the following key competencies for an undergraduate Data Science major:
» Computational and statistical thinking
» Mathematical foundations
» Model building and assessment
» Algorithms and software foundation
» Data curation
» Knowledge transference – communication and responsibility
The document also provides the following curricular content designed to help students acquire the skills and competencies listed above:
» Introduction to Data Science I and II: Introduction to high-level language; Exploring and manipulating data; Functions and basic coding; Introduction to modelling, both deterministic and stochastic; Concepts of projects and code management; Databases; Introduction to data collection and statistical inference
» Mathematical Foundations I and II: Mathematical structures; Linear modeling and matrix computation; Optimization; Multivariate thinking; Probabilistic thinking and modeling
» Computational thinking
o Algorithms and Software Foundations: Algorithm design; Programming concepts and data structures; Tools and environments; Scaling for big data
o Data Curation—Databases and Data Management: Apply Data query languages to relational databases; Data management including cleaning and initial structuring; Transforming data into structured forms required for exploration, visualization, and analysis
» Statistical thinking:
o Introduction to Statistical Models: Exploratory data analysis approaches and graphical data analysis methods; Estimation and testing; Simulation and resampling; Introduction to models; Introduction to model selection and performance
o Statistical and Machine Learning: Further exploration of alternatives to classical regression and classification; Algorithmic analysis of models, addressing issues of scalability and implementation; Performance metrics and prediction, and cross validation; Data transformations; Supervised learning versus unsupervised learning; Ensemble methods (e.g., boosting, bagging, and model averaging)
» Course in an outside discipline
» Capstone Course: A capstone experience in which students consider scientific questions, collect and analyze data and communicate the results
While the content described above may be found in traditional mathematics, computer science, and statistics courses, the PCMI report recognizes the potential synergies that may result from “interweaving and integration of traditionally siloed topics and tools into a cohesive presentation.”
2.1.3 National Academies Report on Data Science for Undergraduates, U.S. (2018)
The National Academies conducted a study, sponsored by the National Science Foundation, to “set forth a vision for the emerging discipline of data science at the undergraduate level” [16]. While the study did not specifically focus on curricular guidelines, it explored the types of data science skills essential for current undergraduates as well as the future data science workforce, identified some of the challenges of starting a new data science program, and highlighted the need for multiple pathways.
The report underscores the need to instil data acumen as part of the education of all data scientists, and identifies the following concepts as key to developing data acumen:
» Mathematical foundations
» Computational foundations
» Statistical foundations
» Data management and curation
» Data description and visualization
» Data modelling and assessment
» Workflow and reproducibility
» Communication and teamwork
» Domain-specific considerations
» thical problem solving.
Some of the key recommendations from the study include:
» Academic institutions should embrace data science as a vital new field that requires specifically tailored instruction delivered through majors and minors in data science as well as the development of a cadre of faculty equipped to teach in this new field.
» Academic institutions should provide and evolve a range of educational pathways to prepare students for an array of data science roles in the workplace.
» To prepare their graduates for this new data-driven era, academic institutions should encourage the development of a basic understanding of data science in all undergraduates.
» Ethics is a topic that, given the nature of data science, students should learn and practice throughout their education. Academic institutions should ensure that ethics is woven into the data science curriculum from the beginning and throughout.
» As data science programs develop, they should focus on attracting students with varied backgrounds and degrees of preparation and preparing them for success in a variety of careers.
» Academic institutions should ensure that programs are continuously evaluated and should work together to develop professional approaches to evaluation. This should include developing and sharing measurement and evaluation frameworks, data sets, and a culture of evolution guided by high-quality evaluation. Efforts should be made to establish relationships with sector-specific professional societies to help align education evaluation with market impacts.
» Existing professional societies should coordinate to enable regular convening sessions on data science among their members. Peer review and discussion are essential to share ideas, best practices, and data.
2.1.4 EDISON Data Science Framework project, Europe (2017)
EDISON, a EU-funded project comprising seven partners from six different countries across Europe, was tasked with “accelerating the creation of the Data Science profession” by “aligning industry needs with available career paths, and supporting academies in reviewing their curricula with respect to expected profiles, required expertise and professional certification [17].” The resulting EDISON Data Science Framework includes a series of four documents addressing Data Science Competences Framework, Data Science Body of Knowledge, Data Science Model Curriculum, and Data Science Professional Framework.
The Data Science Body of Knowledge builds on guidelines from other disciplines including Computer Science, Business Analytics, Software Engineering, Data Management and Project Management, and identifies Knowledge Area groups (KAG) and corresponding Knowledge Areas that may be used to develop both undergraduate and graduate programs in Data Science.
Table 2: EDISON Knowledge Area groups and Knowledge Areas
|
KA Groups |
Suggested DS Knowledge Areas (KA) |
|
KAG1-DSDA Data Science Analytics |
Statistical methods for data analysis Machine Learning Data Mining Text Data Mining Predictive Analytics Computational modelling, simulation and optimisation |
|
KAG2-DSENG Data Science Engineering |
Big Data Infrastructure and Technologies Infrastructure and platforms for Data Science applications Cloud Computing technologies for Big Data and Data Analytics Data and Applications security Big Data systems organisation and engineering Data Science (Big Data) applications design Information systems (to support data driven decision making) |
|
KAG3-DSDM Data Management |
General principles and concepts in Data Management and organisation Data management systems Data Management and Enterprise data infrastructure Data Governance Big Data storage (large scale) Digital libraries and archives |
|
KAG4-DSRM: Research Methods and Project Management |
Research Methods Project Management |
|
KAG5-DSBPM: Business Analytics |
Business Analytics Foundation Business Analytics organisation and enterprise management |
The documents provide a detailed roadmap for stakeholders from academia and industry “to construct their own structured solutions for educating, training, certifying, recruiting, managing, and otherwise supporting data scientists and other data-dependent professionals.”
2.1.5 Computing Curricula 2020 (CC2020)
The CC2020 initiative, led by the Association for Computing Machinery (ACM) and the IEEE Computer Society (IEEE-CS), was launched “to summarize and synthesize the current state of curricular guidelines for academic programs that grant baccalaureate-level degrees in computing as well as propose a vision for future curricular guidelines [18].” The report ‘Computing Curricula 2020 (CC2020): Paradigms for Global Computing Education’ provides a framework to compare and contrast existing curricular guidelines for the following computing disciplines: computer engineering, computer science, information systems, information technology, software engineering, cybersecurity and data science (under progress).
The report identifies cloud computing, smart cities, sustainability, parallel computing, internet of things, and edge computing as “current curricular areas”, and the following top-ten emerging computing trends (I) Deep learning (DL) and Machine Learning (ML); (II) Digital Currencies; (III) Blockchain; (IV) Industrial IoT; (V) Robotics; (VI) Assisted Transportation; (VII) Assisted/augmented reality and virtual reality (AR/VR); (VIII) Ethics, laws, and policies for privacy, security, and liability; (IX) Accelerators and 3D; (X) Cybersecurity and AI.
The report recommends a transition from a knowledge-based to a competency-based learning framework and highlights the “need for industry engagement to formulate workplace competencies”.
2.1.6 Other Related Initiatives
The Institute for Operations Research and the Management Sciences (INFORMS) provides guidelines for Analytics program focusing primarily on the needs of Business. These programs typically do not require higher level Mathematics courses. Wilder and Ozgur define knowledge and skills for an undergraduate program in analytics and develop a curriculum for a proposed Business Analytics Program [19]. Courses include
» Data Management
» Descriptive Analytics
» Data Visualization
» Predictive Analytics
» Prescriptive Analytics
» Data Mining
» Analytics Practicum
The INFORMS Certified Analytics Professional (CAP) is a global professional certification that provides “an independent verification of the critical technical expertise and soft skills sought by employers across all organizations and industry sectors.” Applicants must have a bachelor’s degree (or master’s degree) in an Analytics related field with 5 (3) years of experience. The exam covers knowledge, skills, and abilities (KSAs) in the following seven domains:
» Business Problem Framing
» Analytics Problem Framing
» Data
» Methodology Selection
» Model Building
» Deployment
» Lifecycle Management
Summary:
Each of the reports/ studies discussed above views Data Science through its unique disciplinary lens. ACM not surprisingly focuses on Computer Science while the Park City report focuses on Mathematics and Statistics. The CC2020 report highlights computing knowledge and skills and the EDISON report focuses on training data science professionals, emphasizing areas such as Data Science Engineering and Business Analytics. The National Academies Study takes a holistic, high-level view of the emerging discipline of Data Science, emphasizing the need to “prepare diverse students for various careers.”
Despite the different perspectives, there are a number of common threads:
» While the technical requirements may vary for different data science pathways, it is important for students to have some knowledge of the core disciplines of Computer Science, Mathematics, and Statistics that form the foundations of Data Science
» Students must understand the entire data life cycle from data generation and collection all the way through analysis and interpretation. Access to real-world data and applications is also a critical component of the training.
» Ethics training must be included as part of any Data Science curriculum.
» Communication and Teamwork must be emphasized.
» Given the landscape is constantly evolving, Data Science programs must adapt their curriculum to remain current and address the needs of the future workforce.
2.2 AI
The report, A 20-Year Community Roadmap for Artificial Intelligence Research in the US, views AI “as a branch of computer science that studies the properties of intelligence by synthesizing intelligence.” However, the report acknowledges that the term AI “has become a colloquial term that is used very loosely” and often “equated with machine learning, and specifically with learning from large amounts of data in order to make predictions [3].” The European Commission’s High-Level Expert Group on AI provided the following operational definition of AI [20]:
Artificial intelligence (AI) systems are software (and possibly also hardware) systems designed by humans that, given a complex goal, act in the physical or digital dimension by perceiving their environment through data acquisition, interpreting the collected structured or unstructured data, reasoning on the knowledge, or processing the information, derived from this data and deciding the best action(s) to take to achieve the given goal. AI systems can either use symbolic rules or learn a numeric model, and they can also adapt their behaviour by analysing how the environment is affected by their previous actions
2.2.1 AI Undergraduate Programs
Since no formal “professionally-endorsed” curriculum guidelines exist for undergraduate programs in AI, we compared the curriculum for three programs, one each from China, the United States, and India:
» Nanjing University of Information Science and Technology (NUIST), one of the 35 institutions approved by China’s Ministry of Education in 2019 to offer a four-year AI major.
» Carnegie Mellon University, the first U.S. institution to introduce a Bachelor’s degree in AI in 2018.
» Indian Institute of Technology, Hyderabad, possibly the first Indian institution to introduce the B. Tech degree in AI.
The curriculum for the three programs is summarized below. We have not included general education requirements, specialized courses, or electives from other disciplines.
|
China |
United States |
India |
|
|
CORE |
» Advanced Mathematics » Probability and Statistics » Python Programming » Data Structure and Algorithms » Machine Learning » Optimization » Computer Vision and Pattern Recognition » Neural Networks and Deep Learning » Introduction to Artificial Intelligence » Natural Language Processing » Digital Image Processing » Information Retrieval and Data Mining |
» Mathematics » Statistics » Principles of Imperative Computation » Principles of Functional Programming » Parallel and Sequential Data Structures and Algorithms » Introduction to Computer Systems » Concepts in Artificial Intelligence » Introduction to AI: Representation and Problem Solving » Introduction to Machine Learning » Introduction to Natural Language Processing / Computer Vision » Ethics Elective » Cognitive Science / Cognitive Psychology |
» Mathematics » Statistics » Introduction to Artificial Intelligence » Programming for AI » Data Structures » Foundations of Machine Learning » AI and Humanity » Convex Optimization » Algorithms » Deep Learning » Ethics and Values » Robotics » Reinforcement Learning » Advanced Topics in ML |
|
ELECTIVES |
Information Theory, Analysis of Social Networks, Multi-agent System, Software Engineering, Knowledge Engineering, and Database Theory. |
One course from each of the four clusters: Decision Making and Robotics; Machine Learning; Perception and Language; Human-AI Interaction Cluster |
Courses from five different clusters: Core AI and ML; Language Technologies; Speech and Vision; Data Analytics; AI, Neuroscience and Natural Intelligence |
The three degrees are very similar, requiring foundational training in Mathematics, Statistics, and Computer Science, and courses in Machine Learning, Data Structures, Deep Learning, Algorithms, and NLP. CMU and IIT Hyderabad both require ethics courses as part of the curriculum.
2.3 AI Graduate Programs
The PREDICT (Prospective Insights on R&D in ICT) project, supported by the European Commission, “focuses on analysing the supply of Information and Communications Technologies (ICT) and Research and Development (R&D) in ICT in Europe [21].” As part of its mandate to assess the availability of advanced digital skills, an analysis of the AI or AI-related curriculum offered by post-graduate academic programs in 13 European countries (Belgium, Denmark, Finland, France, Germany, Italy, Ireland, Netherlands, Portugal, Spain, and Sweden in the EU; plus Switzerland and the United Kingdom) was conducted. The study leverages existing curricular efforts in other Informatics or Computing domains, including Computer Science, Data Science, Information Technology, and Cybersecurity, and “fills a gap that these do not completely fill in relation with knowledge and competences required by strong AI.”
The report identifies the key building blocks of Master’s programs in terms of the following knowledge areas:
Table 2: Knowledge Areas for a competency- based EU AI curriculum
|
Group |
Knowledge Area |
Scope within Al |
|
|
AI Essentials |
FIC |
Fundamentals of Informatics/ Computing |
Concepts, theories, methods and techniques of Informatics or Computing, Computer Science and Software Engineering that are at the foundations of building an intelligent system |
|
FMS |
Fundamentals of Maths & Statistics |
Concepts, theories. methods and techniques of Mathematics. Probability and Statistics that form the foundations of intelligent systems |
|
|
AI General |
AIG |
AI General |
Systems designed by humans that, given a complex goal, act in the physical or digital dimension by perceiving their environment through data acquisition, interpreting the collected structured or unstructured data, reasoning on the knowledge, or processing the information, derived from this data and deciding the best action(s) to take to achieve the given goal |
The report identifies the key building blocks of Master’s programs in following knowledge areas:
|
Group |
Knowledge Area |
Scope within AI |
|
|
AI Core areas |
KRR |
Knowledge Representation and Reasoning |
Representation of information and knowledge in logic and probabilistic formalisms. Application of automated reasoning methods to the represented information and knowledge |
|
PSO |
Planning, Search & Optimisation |
Methods for planning and executing solutions by intelligent systems |
|
|
ML |
Machine Learning |
Algorithms that improve through experience to identify patterns in data to build models in order to gain valuable information. It includes the processing, analysis and presentation of data |
|
|
NLP |
Natural Language Processing |
Collection and parsing of text data to generate and understand human languages |
|
|
CP |
Computational Perception |
Interpretation of data in a manner that is similar to the way humans uses their senses to relate to the world around them, mainly through vision and audio processing |
|
|
RAI |
Robotics, Agents & Integration |
Distribution, coordination, cooperation, and autonomy of intelligent systems with the environment, as well as the combination of other abilities |
|
|
HMI |
Human-Machine Interaction |
Interaction of humans with computers and intelligent machines and technologies that let humans interact with computers in effective ways |
|
|
AI Applied areas |
AIS |
AI Services |
Infrastructure, software and platforms provided as digital services or applications to run Al, which are available off-the-shelf and run on demand |
|
PEA |
Philosophy & Ethics of Al |
Philosophical and ethical issues associated with AI and related with the compliance of ethical principles and values, including applicable regulation |
|
The results of the analysis of knowledge topics on core AI subdomains and AI fundamentals on informatics and statistics from all the programs show the following coverage:
» Machine learning (21.4%),
» Fundamental Informatics/Computing (10.9%),
» Computer vision (10.0%),
» Fundamental maths & statistics (8.1%),
» Human-computer interaction (7.1%),
» Knowledge representation and reasoning (7%),
» Natural language processing (6.4%),
» Planning, search and optimisation (5.4%)
» Robotics and intelligent automation (5.4%)
The report notes that many of the programs focus on topics that are not in their specific domains, and very few offer courses in ethics. Machine learning is reported as “the domain most covered by topics of most programs, independently of the initial master program’s aim.” The report highlights the need for a more extensive analysis of curricular requirements in consultation with stakeholders from academia and industry.
While there are a number of comprehensive reports addressing curriculum guidelines and recommendations for Data Science programs, there are very few resources available for AI. With the increasing demand for AI professionals, societies like IEEE, AAAI, and ACM, academia, and industry groups need to come together to identify knowledge and skills needed for different AI careers and develop curricular recommendations.