ChatGPT & Data Science - CSV Cleaning & Data Exploration Prompts
If you've ever tried to get into data science, you might have struggled with how to use CSV data files. You might not have been sure about what the right tool for the task was. "Should I use R or Python or Excel to work with my data?"
Now we have ChatGPT.
ChatGPT is an awesome tool that you can chat with and get work done for you.
For example, you could ask it, "What is 2 + 2?" and it will respond "2 + 2 is equal to 4."
So what?
Turns out we can also use ChatGPT for some basic data science and data exploration. That's right - you can bring your data into ChatGPT and work with it using plain ol' English text.
This course foregoes the typical "coding required" of most data exploration and data science courses. Everything you learn can be done by just using ChatGPT. No Python, no R, no code.
Just you, your data, and the machine.
This course is based on what Elias teaches in his data science 101 courses for the University of Waterloo.
This course contains a collection of prompts with real-world examples, including:
- How to create a "friendly" company name from a full one for sending cold emails in a personalized way. E.g. "A1 Consulting Inc." becomes "A1 Consulting."
- Generating fictional test data in CSV format, for example to generate a list of test companies for use in other software.
- Filter CSV data files using ChatGPT. For example, get only the companies with a valuation of over 50 billion dollars.
- Diagnosing why a row was or wasn't filtered out by ChatGPT.
- Summing and averages across an entire data set using ChatGPT. For example, find the total valuation of a group of companies, and find the average number of employees in a list of companies.
- Rename columns.
- Clean data. For example, each row has the value like "35 billion." Use ChatGPT to remove the "billion" from each row.
- Create calculated columns from existing data. For example, compute how much each employee contributes to the company valuation on average.
That's just to start. As Elias discovers more prompts, you'll get future additions for free. How cool is that?
About the Course Author, Elias Puurunen
Elias Puurunen is the president of Northern HCI Solutions. He architects big data solutions for Fortune 500 companies, governments, and startups. His data warehouses enable Canada's top high-performance building researchers to produce new electric grid guidance to reduce energy usage and greenhouse gas emissions while maintaining quality of life.
In 2019, Puurunen developed the Coding for Public Policy course (PS 627) for the University of Waterloo's Master of Public Service program, and in 2022 developed the Python & Public Data for the University of Waterloo's WatSPEED program.
When Elias isn't teaching data science, he hosts awesome virtual and hybrid events with his event company, Tractus Events.
You'll get this course, which includes the sample CSV files and exact prompts.