Information

Interesting examples of models


I am studying the general use of models in biology in terms of methodology, applications, usefulness etc, and I would really appreciate any recommendations of some specific examples of models from any area in biology. Some examples are the logistic growth model or the Lotka-Volterra prey predator model.

It would be nice if you could shortly explain what the model is used for and why do you think it's an interesting case, and give a reference for further study.

Also any recommendation of books on model development and other meta-model questions in biology is very welcome. Thank you.


Biology is a large field of knowledge! As you give examples drawn from population biology (logistic population growth and Lotka-Voltera models), I will assume you are mainly interested in ecology and evolution.

For analytical models used in ecology and evolution, I highly recommend the book A Biologist's Guide to Mathematical Modeling in Ecology and Evolution by Otto and Day. The book assumes a relatively low level of knowledge in mathematics from the reader and walk the reader through most of the most iconic models in ecology and evolution, their analysis and interpretation. The mathematical tools used in this book include linear algebra, equilibrium analysis, analysis of cyclic behaviour, Markov chain and esp. birth-death processes, diffusion equations, probability theory and stochastic models, approximation theory, class structure models (Leslie matrix and more) and, separation of time scales among others.

For books, mainly theoretical, in population genetics, please have a look at the post Books on population or evolutionary genetics?.

As you talk about the Lotka-Voltera model, you will find an intro to this model in the post What prevents predator overpopulation?.


One nice model to use as a example the Hodgking-Huxley model that describe the current through a nerve fiber given a certain membrane voltage. This model describes the sodium and potassium currents that are the basis of the action potential.

Some characteristics of this model are:

This model explains the membrane currents through the activation of "charged particles" in the membrane that allow the passage of ions (K+ and Na+). It was latter discovered that in the membrane there are Na+ and K+ channels that carry these currents (in the original model there was no evidence of the molecular components of theses channels).

The model can reproduce the action potential of neurons.

The model predicts the existence of "charged particles" in the membrane that must move before the channel open, named gating charges. Since this particle move from one side of the membrane to the other, they generate a small transient current ("gating currents"). These where actually measured 20 years after the proposition of this model.

This is a great model because it described mathematically and physically the behavior of neurons and also predicted several properties that where later fond.

You can read more here:

HH model


Descriptive Model

Sanford Friedenthal , . Rick Steiner , in A Practical Guide to SysML (Second Edition) , 2012

Descriptive Models

A descriptive model describes the domain it represents in a manner that can be interpreted by humans as well as computers. It can be used for many purposes, such as those described in Chapter 2 , Section 2.2.2 . It can include behavioral, structural, and other descriptions that establish logical relationships about the system, such as its whole-part relationship, the interconnection between its parts, and the allocation of its behavioral elements to structural elements. Descriptive models are generally not built in a manner that directly supports simulation, animation or execution, but they can be checked for consistency and adherence to the rules of the language, and the logical relationships can be reasoned about.

The system model is a descriptive model that captures the requirements, structure, behavior, and parametric constraints associated with a system and its environment. The system model also captures inter-relationships between elements that represent its requirements, structure, behavior and parametric constraints. Because its modeling language supports various abstraction techniques, the system model also provides the ability to represent many other views of the system, such as a black-box view, white-box view, or a security view of the system. The system model can also be queried and analyzed for consistency, and serves as an integrating framework as described in Section 18.1.1 .


Animal Models

Conclusion

Animal models have greatly improved our understanding of the pathophysiology of ICH and have provided a valuable platform for testing potential therapeutic strategies. However, as with other preclinical disease models, no single animal model can mimic all clinical features of ICH. To contend with this drawback, investigators who perform translational studies increasingly use more than one ICH model in their work [8] . The most commonly used strategy is to use a combination of the collagenase and the blood injection models to confirm the major findings.

In our efforts to develop effective therapies for ICH that can be verified in preclinical studies as well as in clinical trials, we must be aware of the pitfalls of ICH animal models. (1) Most clinical ICH occurs in middle-aged and aged individuals and in those with hypertension. Therefore using middle-aged or aged animals or spontaneous hypertensive animals provides a more clinically relevant model, but most translational studies are still carried out in young, healthy animals. (2) Considering the spontaneous nature of ICH, a better model for mimicking clinical ICH would entail spontaneous arterial rupture or bleeding and rebleeding. (3) Anesthesia, surgery, needle insertion, and type of collagenase may all affect the final results. (4) No standard histologic test indices correlate with functional outcomes such as behavioral tests. Therefore it is very difficult to evaluate the full therapeutic effect and side effects of a potential drug. (5) The recovery of patients after ICH may take several months, but most preclinical studies focus only on the acute phase. More preclinical studies should examine the later stages or the recovery stage of ICH to gather information on long-term histologic and functional outcomes.

To make progress in ICH treatment, new animal models are needed that better mirror human ICH pathology. To this end, researchers should utilize different ages, genders, or species of animals and test different injection materials in preclinical studies. Additionally, more communication among basic researchers, translational researchers, and physicians can help to select/generate the best ICH models, improve surgical procedures, and optimize outcomes.


2. Epidemiology and Coronavirus

Epidemiology is the study of how diseases spread in populations. Now that we're in one of the largest pandemics in recent memory, it's easy to see why this is such an interesting topic. Finding a vaccine and ways to prevent the spread of COVID-19 has become the single greatest all-hands-on-deck effort of our time.

Even before the coronavirus outbreak, epidemiology intrigued scientists. Epidemiologists are often thought of as a modern-day Indiana Jones because they work in remote jungles and chase dangerous and terrifying diseases like Ebola all around the world.


Science Model Examples

Analogy models are a great way of describing something to students they are unable to see. Teachers use them all the time when they compare a system to something the students are more familiar with. The more similarities the analogy model has to the target system, the better. Visual representations of these models help students make concept links more easily. It can be even more effective to have students create their own analogy models on Storyboard That! Discussions around the similarities and differences between the analogy are essential after they've been created. These can be teacher-led in a whole class setting, or smaller student-led discussions. Analogy models can be useful when students are learning about a wide range of topics in science, especially topics with abstract hard-to-visualize parts, like electrical circuits.


What is Artificial Intelligence Used for Today?

Several examples of artificial intelligence impact our lives today. These include FaceID on iPhones, the search algorithm on Google, and the recommendation algorithm on Netflix. You’ll also find other examples of how AI is in use today on social media, digital assistants like Alexa, and ride-hailing apps such as Uber.

1. Face Detection and Recognition Technology

Virtual filters on Snapchat and the FaceID unlock on iPhones are two examples of AI applications today. While the former uses face detection technology to identify any face, the latter relies on face recognition.

The TrueDepth camera on the Apple devices projects over 30,000 invisible dots to create a depth map of your face. It also captures an infrared image of the user’s face.

Apple’s FaceID technology helps protect the information users store in their iPhone and iPad Pro. Face ID uses the TrueDepth camera and machine learning for a secure authentication solution. | Still shot from the iPhone X launch in 2017. | Apple

After that, a machine learning algorithm compares the scan of your face with what a previously enrolled facial data. That way, it can determine whether to unlock the device or not.

According to Apple, FaceID automatically adapts to changes in the user’s appearance. These include wearing cosmetic makeup, growing facial hair, or wearing hats, glasses, or contact lens.

The Cupertino-based tech giant also stated that the chance of fooling FaceID is one in a million.

2. Text Editor

Several text editors today rely on artificial intelligence to provide the best writing experience.

For example, document editors use an NLP algorithm to identify incorrect grammar usage and suggest corrections. Besides auto-correction, some writing tools also provide readability and plagiarism grades.

INK is powered by a natural language processing technology that allows it to make intelligent SEO recommendations, helping writers and marketer make their content more relevant to their target audience. | INK | inkforall.com

However, editors such as INK took AI usage a bit further to provide specialized functions. It uses artificial intelligence to offer smart web content optimization recommendations.

Just recently, INK has released a study showing how its AI-powered writing platform can improve content relevance and help drive traffic to sites. You can read their full study here.

3. Social Media

Social media platforms such as Facebook, Twitter, and Instagram rely heavily on artificial intelligence for various tasks.

Currently, these social media platforms use AI to personalize what you see on your feeds. The model identifies users’ interests and recommends similar content to keep them engaged.

Social media networks use artificial intelligence algorithms to personalize user feeds and filter unnecessary information like hate speech and posts inciting violence and discrimination. | 200degrees/Pixabay.com

Also, researchers trained AI models to recognize hate keywords, phrases, and symbols in different languages. That way, the algorithm can swiftly take down social media posts that contain hate speech.

Other examples of artificial intelligence in social media include:

  • Emoji as part of predictive text
  • Facial recognition to automatically tag friends in photos
  • Smart filter to identify and remove spam messages
  • Smart replies for quickly responding to messages

Plans for social media platform involve using artificial intelligence to identify mental health problems. For example, an algorithm could analyze content posted and consumed to detect suicidal tendencies.

4. Chatbots

Getting queries directly from a customer representative can be very time-consuming. That’s where artificial intelligence comes in.

Computer scientists train chat robots or chatbots to impersonate the conversational styles of customer representatives using natural language processing.

Chatbots are currently being used by many businesses to assist potential customers with their queries. | 200degrees/Pixabay.com

Chatbots can now answer questions that require a detailed response in place of a specific yes or no answer. What’s more, the bots can learn from previous bad ratings to ensure maximum customer satisfaction.

As a result, machines now perform basic tasks such as answering FAQs or taking and tracking orders.

5. Recommendation Algorithm

Media streaming platforms such as Netflix, YouTube, and Spotify rely on a smart recommendation system that’s powered by AI.

First, the system collects data on users’ interests and behavior using various online activities. After that, machine learning and deep learning algorithms analyze the data to predict preferences.

That’s why you’ll always find movies that you’re likely to watch on Netflix’s recommendation. And you won’t have to search any further.

6. Search Algorithm

Search algorithms ensure that the top results on the search engine result page (SERP) have the answers to our queries. But how does this happen?

Search companies usually include some type of quality control algorithm to recognize high-quality content. It then provides a list of search results that best answer the query and offers the best user experience.

Search engines like Google is powered by multiple algorithms that help it match people’s queries with the best answers available online. | Google

Since search engines are made entirely of codes, they rely on natural language processing (NLP) technology to understand queries.

Last year, Google announced Bidirectional Encoder Representations from Transformers (BERT), an NLP pre-training technique. Now, the technology powers almost all English-based query on Google Search.

7. Digital Assistants

In October 2011, Apple’s Siri became the first digital assistant to be standard on a smartphone. However, voice assistants have come a long way since then.

Today, Google Assistant incorporates advanced NLP and ML to become well-versed in human language. Not only does it understand complex commands, but it also provides satisfactory outputs.

Google Assistant is one of the most popular digital assistants available today. | Kaufdex/Pixabay.com

Also, digital assistants now have adaptive capabilities for analyzing user preferences, habits, and schedules. That way, they can organize and plan actions such as reminders, prompts, and schedules.

8. Smart Home Devices

Various smart home devices now use AI applications to conserve energy.

For example, smart thermostats such as Nest use our daily habits and heating/cooling preferences to adjust home temperatures. Likewise, smart refrigerators can create shopping lists based on what’s absent on the fridge’s shelves.

The way we use artificial intelligence at home is still evolving. More AI solutions now analyze human behavior and function accordingly.


Models and Examples

This section contains key information to aid the action research process. First, we present three models or paradigms for action research. Second, we provide some examples from a range of educational research projects that have employed the model 2: Practical Action Research.

Examples of Model 3 are mostly social and community research and examples are not included here. Model 1 is not now widely used in action research and is included here for historical reference only.

Each example is described briefly with reference to the stages in a cycle of an action research project: Question => Plan => Act => Observe => Reflect =>

In actual fact, the difference between the models is the degree to which an outside researcher influences the action research project in terms of framing the research question and determining the direction the research will take. In Model 1 this influence is considerable, whereas in Model 3 practitioners work collaboratively to define their own problems and identify possible solutions.

Note also that the titles usually given to these models ("Technical", "Practical" and "Emancipatory") are somewhat obscure and do not really give a clear idea of how the model is practiced. We have employed these labels because they are in common usage among action researchers, but the accompanying descriptions give a clearer picture.

We anticipate that within this broad framework, instructors will devise their own models and methods of research, consistent with constraints imposed by their students, availability of support and teaching resources.

Three Action Research Models

Technical Action Research

The practitioner, though a collaborator in the research, is not the main researcher.

The main researcher identifies the action research problem and proposes an intervention.

The practitioner is involved in the implementation of any interventions.

Practical Action Research

Here the researcher and the practitioner identify the research problem together and discuss underlying causes and possible interventions.

Emancipatory Action Research

Practitioners work together as a group and collectively identify problems and possible solutions.

Solutions are as much political and consciousness raising as practical.

There is a strong social element here as well, in that it is expected that participants will emerge with a new view or theory of society.

Example for Practical Action Research


    Action Researchers: Grace Au, Ivan Choi, Patrick Chau, Kar Yan Tam, Ben Petrazzini, Tung Bui, ISMT, School of Business and Management, Hong Kong University of Science and Technology
    Action Researchers: M. J. Davidson, Department of Civil Engineering, Hong Kong University of Science and Technology
    Action Researchers: Anna Yu, Pionie Foo, Irene Ng and Lillian Law, Language Center, Hong Kong University of Science and Technology

Example 1

The Action Research Project:

Using interactive multimedia to support information systems training: system design and learning issues.

Action Researchers: Grace Au, Ivan Choi, Patrick Chau, Kar Yan Tam, Ben Petrazzini, Tung Bui, ISMT, School of Business and Management, Hong Kong University of Science and Technology.

Although interactive multimedia has been used in education for some time, little is known about its effectiveness in enhancing students' learning ability. The aim of the project was to investigate the value of multimedia in supporting teaching and learning processes for information systems training.

The multimedia system used was the first generation of Information Systems Explorer (ISEI), a multimedia electronic slide show. The objective of the system was to stimulate students' interest by adding audio-visual capabilities to the lecture presentation. The plan was to use ISEI as a pilot system to investigate students' learning attitudes and learning outcomes in response to such systems.

The system was used in lectures, replacing the use of transparencies and overhead projectors.

In order to evaluate the effectiveness of the system, feedback was obtained from students about their preference for learning with the system. The effect on students' learning was also evaluated.

Feedback from students indicated that the majority preferred to learn in a lecturing environment accompanied by the ISEI, audio-visual presentation. The system helped them to digest information by increasing their interest in the topics covered and they were able to visualize some concepts in a more vivid manner.

On the other hand, the display quality, the need for dim lighting in the lecture hall, and technical interruptions interfered with students' concentration.

Most students wanted to have access to the slide show for revision purposes. Also, although students preferred the presentation to text book learning the system did not necessarily help students to understand how information systems related to business processes in practice.

Review the question

A literature review revealed that problems particular to the subject of information systems were that students find it difficult to visualize how the business world functions and also find it difficult to integrate information systems concepts and apply the knowledge they acquire in answering case study-type questions.

The team asked whether a visual interface could be better designed using a dynamic storyboard to facilitate the formation of the learners' mental model.

The lessons learned with ISEI, together with the findings from the literature review, provided insights in the creation of ISEII, a new version of the system, which incorporated innovative concepts in delivering knowledge to students. The design philosophy of ISEII was to make use of interactive multimedia in providing a desktop virtual reality environment where students could learn about basic information systems concepts by navigating through a simulated office. The system allows students to interact with objects inside the virtual office, review the information associated with that object and respond to questions asked.

The new system was devised to be used both as an aid to lectures and by students as a self-learning tool during revisions.

The system was integrated with lectures over a one semester period.

In each series of ISEII presentations, students were assigned tasks in different case situations, focusing on how to apply information systems concepts in solving business problems in a simulated company.

In order to evaluate the effectiveness of the design, both objective and subjective data were collected at the end of each training session by means of questionnaire surveys and interviews.

Before the evaluation, the students were given a set of cognitive tests and classified according to their cognitive styles and their technical, analytical and management abilities.

Results of the evaluation indicated that the ISEII system had increased students' incentive to learn and interest in learning. A key lesson learnt was that it will greatly enhance the effectiveness of the system if the user is provided with clear goals and an appropriate level of interactivity.

It was found that students tended to lose concentration and interest when the earlier version of the system did not have built-in questions for obtaining instant feedback on the subject. The evaluation also revealed that the learning process will be greatly enhanced if the system design is learner-centered, i.e. it is designed around the end users' mental models based upon individual needs.

Example 2

The Action Research Project:

Lecturing initiatives with multimedia tools in Engineering education

Action Researchers: M. J. Davidson, Department of Civil Engineering, Hong Kong University of Science and Technology

Initially the fluid mechanics course was typical of most engineering courses in that it contained about 30% more materials than should be taught in the specified time frame. Surveys of students indicated that they found the course difficult and the workload heavy. Instructors were concerned about the students' ability to retain the material that was being taught in the current format.

The plan was to modify the curriculum by splitting the course into two courses to be taught in consecutive semesters, thus reducing the amount of material taught in each course. This enabled greater emphasis to be placed on tutorials and less emphasis on lectures.

At the end of each tutorial, students would be asked to do a number of things: a formal quiz with no communication and no help, or students were allowed to ask questions of the instructor, or students were allowed to work in groups.

The new course structure was implemented over one semester for the first of the two courses.

The second after-tutorial format which involved students asking questions was very beneficial in encouraging students to ask questions: there were a greater number of questions during lectures than in the previous format and many more students came to the instructor's office to ask questions outside the lectures.

Improvements were observed in the course but the instructor thought there were additional things which could be done in future to improve the course further.

Review the question

The revised questions focused on:

  • How to improved students' understanding of mathematic derivations needed in fluid mechanics.
  • How to make examples of applications more interesting.
  • How to demonstrate the dynamic nature of problems in the course.

Three new initiatives were planned:

  • Instead of developing topics in fluid mechanics through mathematical derivations followed by discussion, the lecture notes were reorganized so that final equations and applications were presented and discussed first. This was done to help understanding of students who were not mathematically sophisticated and to help students to distinguish between key concepts that must be memorized and other material that needed to be understood but not memorized.
  • More local and regional applications were introduced to improve students' motivation and enhance interest. This process could be aided by videos and photographs in the lecture theatre.
  • Animated diagrams and videos of relevant laboratory experiments and local examples were to be used to demonstrate the dynamic nature of problems being considered. This was done to enhance interest and improve students' physical understanding of the problem.

A single multimedia package software package was planned to introduce all three initiatives more efficiently and effectively.

Before development of the multimedia software package, student data was gathered to set a baseline for evaluating future improvements and to contribute to development of the package itself.

Several observation methods were used.

University course evaluation was used to gauge the overall effectiveness of the course.

The Study Process Questionnaire (SPQ) was used to provide insight into the students' approaches to learning the material.

A diagnostic survey similar in format to the SPQ but with questions that were specific to the course was designed. The survey aimed to reflect what had been achieved to date and identify problems that still needed to be addressed.

The conceptual quiz was designed to explore misunderstandings of the material taught in the earlier part of the course.

The information provided by the observations provides valuable input into development of the multimedia software package and once implemented surveys and conceptual quizzes will be used to monitor its effectiveness.

Example 3

The Action Research Project:

Training effective writers of business English through interactive feedback

Action Researchers: Anna Yu, Pionie Foo, Irene Ng and Lillian Law, Language Center, Hong Kong University of Science and Technology

In their years as writing consultants, the instructors had found numerous problems in students' business writings in that they did not often meet their writing purposes, essential information was missing, tone inappropriate, and set phrases inserted inappropriately.

Moreover, typographical and surface errors show that students do not seem to edit their own work.

To design a four-week intensive business letter writing course to help students overcome the problems in their business letters.

The course introduced a number of innovations over two trials:

Situation analysis checklist: designed to help students think in depth about the multiple facets of the writing situation.

Reader-centeredness in editing: the reader was brought into the editing situation by an audience simulation where another student was asked to role-play the reader and to set out their own expectations of the letter.

Interactive feedback: students obtained feedback from each other in the planning stage while at other stages students and consultants interacted.

Effectiveness of the course was measured in terms of students' attitude and performance before and after the course. Performance was also monitored over an extended period.

Results from the trials were positive and encouraging: students showed significant improvement in the content and tone of the letters they produced.


Data Science Projects

1. “Eat, Rate, Love” — An Exploration of R, Yelp, and the Search for Good Indian Food (Beginner)

When it comes time to choose a restaurant, many people turn to Yelp to determine which is the best option for the type of food they’re in search of. But what happens if you’re looking for a specific type of cuisine and there are many restaurants rated the same within a small radius? Which one do you choose? Robert Chen took Springboard’s Introduction to Data Science course and chose as his capstone project a way to further evaluate Yelp reviewers to determine if their reviews led to the best Indian restaurants.

Chen discovered while searching Yelp that there were many recommended Indian restaurants with close to the same scores. Certainly, not all the reviewers had the same knowledge of this cuisine, right? With this in mind, he took into consideration the following:

  • The number of restaurant reviews by a single person of a particular cuisine (in this case, Indian food). He was able to justify this parameter by looking at reviewers of other cuisines, such as Chinese food.
  • The apparent ethnicity of the reviewer in question. If the reviewer had an Indian name, he could infer that they might be of Indian ethnicity, and therefore more familiar with what constituted good Indian food.
  • He used Python and R programming languages.

His modification to the data and the variables showed that those with Indian names tended to give good reviews to only one restaurant per city out of the 11 cities he analyzed, thus providing a clear choice per city for restaurant patrons.

Yelp’s data has become popular among newcomers to data science. You can access it here . Find out more about Robert’s project here .

2. NFL Third and Goal Behavior (Intermediate)

The intersection of sports and data is full of opportunities for aspiring data scientists. A lover of both, Divya Parmar decided to focus on the NFL for his capstone project during Springboard’s Introduction to Data Science course.

Divya’s goal: to determine the efficiency of various offensive plays in different tactical situations. Here’s a sample from Divya’s project write-up.

To complete his data science project on the NFL’s 3rd down behavior, Divya followed these steps:

  1. To investigate 3rd down behavior, he obtained play-by-play data from Armchair Analysis the dataset was every play from the first eight weeks of this NFL season. Since the dataset was clean, and we know that 80 percent of the data analysis process is cleaning, he was able to focus on the essential data manipulation to create the data frames and graphs for my analysis.
  2. He used R as his programming language of choice for analysis, as it is open source and has thousands of libraries that allow for vast functionality.
  3. He loaded in his Csv file into RStudio (his software for the analysis). First, he wanted to look at offensive drives themselves, so he generated a drive number for each drive and attached it to individual plays dataset. With that, he could see the length of each drive based on the count of each drive number.
  4. Then, he moved on to his main analysis of 3rd down plays. He created a new data frame, which only included 3rd down plays which were a run or pass (excluding field goals, penalties, etc). He added a new categorical column named “Distance,” which signified how many yards a team had to go to convert the first down.
  5. Using conventional NFL definitions, he decided on this:

This hands-on project work was the most challenging part of the course for Divya, he said, but it allowed him to practice the different steps in the data science process:

  1. Assessing the problem
  2. Manipulating the data
  3. Delivering actionable insights to stakeholders

You can access the data set Divya used here .

3. Who’s a Good Dog? Identifying Dog Breeds Using Neural Networks (Intermediate)

Garrick Chu, another Springboard alum, chose to work on an image classification project for his final year, identifying dog breeds using neural networks. This project primarily leveraged Keras through Jupyter notebooks and tested the wide variety of skills commonly associated with neural networks and image data:

  • Working with large data sets
  • Effective processing of images (rather than traditional data structures)
  • Network design and tuning
  • Avoiding over-fitting
  • Transfer learning (combining neural nets trained on different data sets)
  • Performing exploratory data analysis to understand model outputs that people can’t directly interpret

One of Garrick’s goals was to determine whether he could build a model that would be better than humans at identifying a dog’s breed from an image. Because this was a learning task with no benchmark for human accuracy, once Garrick optimized the network to his satisfaction, he went on to conduct original survey research in order to make a meaningful comparison.

See more of Garrick’s work here . You can access the data set he used here .

4. Amazon vs. eBay Analysis (Advanced)

Ever pulled the trigger on a purchase only to discover shortly afterward that the item was significantly cheaper at another outlet?

In support of a Chrome extension he was building, Chase Roberts decided to compare the prices of 3,500 products on eBay and Amazon. With his biases acknowledged, Chase walks readers of this blog post through his project, starting with how he gathered the data and documenting the challenges he faced during this process.

The results showed the potential for substantial savings. For his project, Chase built a shopping cart with 3.5k products to compare prices on eBay vs Amazon. Here’s what he found:

  1. The shopping cart has 3,520 unique items.
  2. If you chose the wrong platform to buy each of these items (by always shopping at whichever site has a more expensive price), this cart would cost you $193,498.45. (Or you could pay off your mortgage.)
  3. This is the worst-case scenario for the shopping cart.
  4. The best-case scenario for our shopping cart, assuming you found the lowest price between eBay and Amazon on every item, is $149,650.94.
  5. This is a $44,000 difference — or 23%!

Find out more about the project here .

5. Fake News! (Advanced)

Another great idea for a data science project is looking at the common forms of fake news. These days, it’s hard enough for the average social media user to determine when an article is made up with an intention to deceive. So is it possible to build a model that can discern whether a news piece is credible? That’s the question a four-person team from the University of California at Berkeley attempted to answer with this project.

First, the team identified two common forms of fake news to focus on: clickbait (“shocking headlines meant to generate clicks to increase ad revenue”) and propaganda (“intentionally misleading or deceptive articles meant to promote the author’s agenda”).

To develop a classifier that would be able to detect clickbait and propaganda articles, these steps were followed:

  1. The foursome scraped data from news sources listed on OpenSources
  2. Preprocessed articles for content-based classification using natural language processing
  3. Trained different machine learning models to classify the news articles
  4. Created a web application to serve as the front end for their classifier

Find out more and try it out here .

6. Audio Snowflake (Advanced)

When you think about data science projects, chances are you think about how to solve a particular problem, as seen in the examples above. But what about creating a project for the sheer beauty of the data? That’s exactly what Wendy Dherin did.

The purpose of her Hackbright Academy project was to create a stunning visual representation of music as it played, capturing a number of components, such as tempo, duration, key, and mood. The web application Wendy created uses an embedded Spotify web player, an API to scrape detailed song data, and trigonometry to move a series of colorful shapes around the screen. Audio Snowflake maps both quantitative and qualitative characteristics of songs to visual traits such as color, saturation, rotation speed, and the shapes of figures it generates.

She explains a bit about how it works:

Each line forms a geometric shape called a hypotrochoid (pronounced hai-po-tro-koid).

Hypotrochoids are mathematical roulettes traced by a point P that is attached to a circle which rolls around the interior of a larger circle. If you have played with Spirograph, you may be familiar with the concept.

The shape of any hypotrochoid is determined by the radius a of the large circle, the radius b of the small circle, and the distance h between the center of the smaller circle and point P.

For Audio Snowflake, these values are determined as follows:

  • a: song duration
  • b: section duration
  • h: song duration minus section duration

Bonus Data Sets for Data Science Projects

Here are a few more data sets to consider as you ponder data science project ideas:

    : an audio-visual data set consisting of short clips of human speech, extracted from interviews uploaded to YouTube. : a classic data set appropriate for data science projects for beginners. : a fairly small data set based on U.S. Census Bureau data that’s focused on a regression problem. : a retail industry data set that can be used to predict store sales. : Nate Silver’s publication shares the data and code behind some of its articles and graphics so admirers can create stories and visualizations of their own.

Tips for Creating Cool Data Science Projects

Getting started on your own data science project may seem daunting at first, which is why at Springboard, we pair students with one-on-one mentors and student advisors who help guide them through the process.

When you start your data science project, you need to come up with a problem that you can use data to help solve. It could be a simple problem or a complex one, depending on how much data you have, how many variables you must consider, and how complicated the programming is.

Choose the Right Problem

If you’re a data science beginner, it’s best to consider problems that have limited data and variables. Otherwise, your project may get too complex too quickly, potentially deterring you from moving forward. Choose one of the data sets in this post, or look for something in real life that has a limited data set. Data wrangling can be tedious work, so it’s key, especially when starting out, to make sure the data you’re manipulating and the larger topic is interesting to you. These are challenging projects, but they should be fun!

Breaking Up the Project Into Manageable Pieces

Your next task is to outline the steps you’ll need to take to create your data science project. Once you have your outline, you can tackle the problem and come up with a model that may prove your hypothesis. You can do this in six steps:

  1. Generate your hypotheses
  2. Study the data
  3. Clean the data
  4. Engineer the features
  5. Create predictive models
  6. Communicate your results

Generate Your Hypotheses

After you have your problem, you need to create at least one hypothesis that will help solve the problem. The hypothesis is your belief about how the data reacts to certain variables. For example, if you are working with the Big Mart data set that we included among the bonus options above, you may make the hypothesis that stores located in affluent neighborhoods are more likely to see higher sales of expensive coffee than those stores in less affluent neighborhoods.

This is, of courses, dependent on you obtaining general demographics of certain neighborhoods. You will need to create as many hypotheses as you need to solve the problem.

Study the Data

Your hypotheses need to have data that will allow you to prove or disprove them. This is where you need to look in the data set for variables that affect the problem. In the Big Mart example, you’ll be looking for data that will lead to variables. In the coffee hypothesis, you need to be able to identify brands of coffee, prices, sales, and the surrounding neighborhood demographics of each store. If you do not have the data, you either have to dig deeper or change your hypothesis.

Clean the Data

As much as data scientists prefer to have clean, ready-to-go data, the reality is seldom neat or orderly. You may have outlier data that you can’t readily explain, like a sudden large, one-time purchase of expensive coffee in a store that is in a lower-income neighborhood or a dip in coffee purchases that you wouldn’t expect during a random two-week period (using the Big Mart scenario). Or maybe one store didn’t report data for a week.

These are all problems with the data that isn’t the norm. In these cases, it’s up to you as a data scientist to remove those outliers and add missing data so that the data is more or less consistent. Without these changes, your results will become skewed and the outlier data will affect the results, sometimes drastically.

With the problem you’re trying to solve, you aren’t looking for exceptions, but rather you’re looking for trends. Those trends are what will help predict profits at the Big Mart stores.

Engineer the Features

At this stage, you need to start assigning variables to your data. You need to factor in what will affect your data. Does a heatwave during the summer cause coffee sales to drop? Does the holiday season affect sales of high-end coffee in all stores and not just middle-to-high-income neighborhoods? Things like seasonal purchases become variables you need to account for.

You may have to modify certain variables you created in order to have a better prediction of sales. For example, maybe the sales of high-end coffee isn’t an indicator of profits, but whether the store sells a lot of holiday merchandise is. You’d have to examine and tweak the variables that make the most sense to solve your problem.

Create Your Predictive Models

At some point, you’ll have to come up with predictive models to support your hypotheses. For example, you’ll have to design code that will show that when certain variables occur, you have a flux in sales. For Big Mart, your predictive models might include holidays and other times of the year when retail sales spike. You may explore whether an after-Christmas sale increases profits and if so, by how much. You may find that a certain percentage of sales earn more money than other sales, given the volume and overall profit.

Communicate Your Results

In the real world, all the analysis and technical results that you come up with are of little value unless you can explain to your stakeholders what they mean in a way that’s comprehensible and compelling. Data storytelling is a critical and underrated skill that you must develop. To finish your project, you’ll want to create a data visualization or a presentation that explains your results to non-technical folks.

Bonus: How Many Projects Should Be in a Data Science Portfolio?

Data scientist and Springboard mentor David Yakobovitch recently shared expertise on how to optimize a data science portfolio with our data science student community. Among the advice he shared were these tips:

For the Data Science Career Track, we have two capstones that students work on, so I like to say a minimum of two projects in your portfolio. Often when I work with students and they’ve finished the capstones and they’re starting the job search, I say, “Why not start a third project?” That could be using data sets on popular sites such as Kaggle or using a passion project you’re interested in or partnering with a non-profit.

When you’re doing these interviews, you want to have multiple projects you can talk about. If you’re just talking about one project for a 30- to 60-minute interview, it doesn’t give you enough material. So that’s why it’s great to have two or three, because you could talk about the whole workflow—and ideally, these projects work on different components of data science.

Learning the theory behind data science is an important part of the process. But project-based learning is the key to fully understanding the data science process . Springboard emphasizes data science projects in all three data science courses. The Data Science Career Track features 14 real-world projects, including two industry-worthy capstone projects.

Interested in a project-based learning program that comes with the support of a mentor? Check out our Data Science Career Track—you’ll learn the skills and get the personalized guidance you need to land the job you want.


A Lesson on Modern Classification Models

In machine learning, classification problems are one of the most fundamentally exciting and yet challenging existing problems. The implications of a competent classification model are enormous — these models are leveraged for natural language processing text classification, image recognition, data prediction, reinforcement training, and a countless number of further applications.

However, the present implementation of classificat i on algorithms are terrible. During my time at Facebook, I found that the generic solution to any machine learning classification problem was to “throw a gradient descent boosting tree at it and hope for the best”. But this should not be the case — research is being put into modern classification algorithms and improvements that allow significantly more accurate models with considerable less training data required.

Here, we explore some particularly interesting examples of modern classification algorithms. This article assumes some level of familiarity with machine learning, however, the majority of the post should still be accessible without.

Deep Neural Decision Trees

Deep neural networks have been proven powerful at processing perceptual data, such as images and audio. However, for tabular data, tree-based models are more popular for a couple reasons — one significant one being that they offer natural interpretability.

For example, consider a business that is attempting to determine why a system is failing. You would make a predictive model with several parameters — some examples are network speed, uptime, threads processing, and type of system. With a decision tree, we can additionally get a sense of why a system is failing.

Deep neural decision trees combine the utility of a decision tree with the impact developed by neural nets. Because it is implemented with a neural network, DNDT supports out of the box GPU acceleration and min-batch learning of datasets that do not fit in memory, thanks to modern deep learning frameworks. They have thus shown to be more accurate than traditional decisions on many datasets. Furthermore, they are easy to use — an implementation involves about 20 lines of code in TensorFlow or PyTorch.

We can explore the concepts behind the core math of the model. First, we need a way to make the split decisions (i.e. how we decide which path of the tree to take). Then, we need to combine together our split decisions to build the decision tree.

We make the split decision via a binning function. A binning function takes an input and produces an index of the bins that the input belongs. In our model, each of the bins represents a feature — for example, uptime, network speed, or system type.

We bin each feature via its own one-layer neural network (the “deep neural” part of our model) with the following activation function.

We then multiply our bins with each other via a Kronecker product to construct our decision tree.


What Are Some Examples of Physical Models?

Scientists use many kinds of physical models to predict and understand that which they cannot observe directly. Physical models range from the Bohr model of the atom to models of the universe, which illustrate planets' orbits around the sun. Usually, models make something very small larger or something large smaller.

While the Bohr model of the atom is not accurate in its portrayal of the nature of the orbits of the electrons, it was the first physical model that incorporated quantum theory and provided an understanding of electron behavior, according to Encyclopedia Britannica. Each physical model has its own limitations and does not always provide a complete representation of what occurs in nature.

The double-helix is a physical model of DNA that aids scientists in visualizing the structures of DNA and its function in gene reproduction. The creation of this model, like many others, was dependent on the use of experimental data.

Scientists often use computer generated models in conjunction with physical models. Although there are limitations, computer generated models can provide greater detail than physical models. For example, a computer model of the universe, described by The Atlantic, is rendered as a cube with 350 million light years on each side, built from 12 billion three-dimensional pixels. Attempting to capture the universe's size with a physical model would yield a much less impressive result.


Watch the video: Επιστροφή Στη Σχολή. Day In The Life Οf A Makeup Trainer (January 2022).