In the last 12 months ChatGPT has been established as the go-to generative AI solution for the mainstream, with over 1 billion monthly active users, it's no surprise everyone is talking about the impact of ChatGPT on the future of humanity. I am one of those billion users, and I have been using it multiple times each week for the last year, exploring its utility and trying to identify some of the pitfalls.
The premium membership, ChatGPT plus, has gone through several different guises, but recently the ways of interacting with the advanced features of GPT-4 have been greatly streamlined through the addition of custom GPTs, the GPT store and the integration of image generation, code interpreter and bing search all within a single chat.
So, with that greatly streamlined approach I wanted to re-run a little task I gave ChatGPT several months ago, to see how much of an improvement had been made. The task was based on a weekly challenge from the data visualisation community, WorkoutWednesday and challenged users to recreate a visualisation of the change from normal Antarctic Sea Ice Extent from 1978 through 2023. Here's the original below, you can read the full challenge here.
In the original challenge, I had to use multiple plug-ins (LinkReader and Noteable) as the code interpreter wasn't functional at that point, so I couldn't directly upload source data and I couldn't run any code within the ChatGPT environment, it all had to be done through a Jupyter notebook that we could both access. This initial run took a few hours of trial and error, and unfortunately because of the nature of the plug-ins and notebooks I'm no longer able to access the code or the visualisations made. What I can tell you is that the overall challenge took 34 prompts (including edits) and required me to restart the project 3 times because it lost connection to the Jupyter notebook.
I was curious to try the same task in February 2024. Here are some of my findings of using the current version of ChatGPT for data analysis.
For better or worse, ChatGPT now seems to be more likely to push through multiple steps of a plan in one output, and will attempt to fix any errors it encounters by itself. This can be an absolute blessing when it goes well, but if it goes wrong then the results can lead to some real goofiness as it ends up doubling down on any errors or hallucinations it made.
Errors can be costly and going back to previous prompts to edit as you would with just a language task can end up in some really odd places. For instance, on my first attempt I got a great initial plot and I was looking to make some some subsequent minor tweaks and it ended up completely messing with the data and in some whacky ways and there was no way back to the state in which things were correct and I had to start a new chat. There is also a limited time that the code interpreter is active for after you leave the chat, so if you return to a previous chat or a branch of a chat then you will lose your previous plots and data, so make sure to download your dataframe and graphs at each successful step to give you a good restore point.
I was able to produce the above graph with ChatGPT in just 7 prompts, taking about 10 minutes in total. This in itself is pretty amazing and I was really impressed with the result. There was some strangeness in the 1970s and 1980s plots due to the incompleteness of those decades so I ended up removing them; but otherwise the data points seem to be a good match and I'm sure some more tinkering could have gotten me even closer to the original image.
But that's not where ChatGPT's power really lies. It's true power, or at least one of them, is enabling people to have conversations with their data. By allowing people to interact with data using natural language, it's possible to facilitate the rapid development and iteration of hypotheses, analysis plans and results.
I asked ChatGPT to suggest what the 2023 results imply, as well as suggesting some further analyses that could be conducted with the data. Here is Chat GPT's interpretation and suggested analyses:
The data for 2023 suggests that the Antarctic ice extent anomaly is significantly lower compared to the "global" average of the same days over the available historical data. Specifically, the red line representing 2023 shows a consistently negative anomaly, meaning that the ice extent for 2023 is less than what is typically expected based on historical data. This could be indicative of a decreasing trend in Antarctic ice extent for the year 2023.
Other analyses that could be conducted with this dataset include:
Trend Analysis:
- Calculate and visualize long-term trends to see if the ice extent is increasing or decreasing over the decades.
- Use statistical methods like Mann-Kendall tests to detect trends in time series data.
Seasonal Analysis:
- Examine the seasonal patterns and variability in the ice extent.
- Determine if there are changes in the timing of seasonal ice extent maxima and minima.
Correlation with Climatic Factors:
- Investigate the relationship between ice extent and climatic factors such as temperature, sea surface temperature, atmospheric circulation patterns, and sea ice concentration.
Forecasting:
- Develop predictive models to forecast future ice extent based on historical data.
- Utilize machine learning techniques for more complex modeling.
Anomaly Detection:
- Use statistical methods to identify years with significant anomalies.
- Determine if these anomalies correlate with any significant climatic events or changes.
Comparative Analysis:
- Compare the Antarctic ice extent with the Arctic ice extent to understand hemispherical differences and similarities.
Impact Assessment:
- Assess the potential impact of ice extent changes on global sea levels.
- Study the ecological impacts, such as changes in habitat for polar species.
Change Point Detection:
- Identify specific points in time where the ice extent data shows abrupt changes.
- Use methods like the Bayesian change point analysis to detect shifts in the data distribution.
Each of these analyses could contribute to a better understanding of the factors driving changes in the Antarctic ice extent and the broader implications for the Earth's climate system.
As you can see, ChatGPT suggested several appropriate analyses, and even suggested incorporating additional data sets to make the data more valuable. I then asked ChatGPT to suggest which analyses it could perform within it's current environment without any additional data and to suggest a concise and efficient analysis plan for the trend analysis and two version of the seasonal analysis. I then asked it to perform each analysis and reflect on the findings and hypothesise about the causes of the results based on it's own knowledge.
Trend Analysis:
- We conducted a linear regression to analyze the long-term trend of the Antarctic ice extent based on a 7-day rolling average.
- The analysis showed a positive slope, suggesting an overall increasing trend according to the model. However, visual inspection and residual analysis indicated periods, particularly in the mid-2010s, where the actual ice extent was significantly below the model's predictions.
Seasonal Analysis:
- By examining the seasonal decomposition of the data, we identified the typical seasonal cycle of ice extent, which peaks during the Antarctic winter (May to September) and decreases in summer (October to February).
- This analysis highlighted the regular seasonal patterns and the variability within them across different years.
Monthly Distribution Analysis:
- Using box plots, we visualized the distribution of ice extents for each month across all years. This provided insights into the central tendency, variability, and outliers for each month, offering a detailed view of interannual variability and seasonal cycles.
- Southern Annular Mode (SAM) shifts, influencing wind patterns and temperatures around Antarctica, possibly affecting sea ice distribution and extent.
- Oceanic Conditions: Changes in ocean temperature and salinity, affecting sea ice from below.
- Global Warming: The overall warming trend leading to higher temperatures and reduced sea ice extent.
- Extreme Weather Events: Increased frequency and intensity of extreme weather could disrupt sea ice formation and stability.
- Ice Shelf Breakup: Events leading to increased iceberg calving, indirectly indicating changing conditions in the Antarctic ice system.
Suggested Next Steps
- Further Statistical Analysis: Apply more complex statistical models to better understand non-linear trends and variability in the ice extent data.
- Climatic Data Integration: Incorporate external climatic and environmental datasets (e.g., temperature, salinity, wind patterns) to investigate their correlation with sea ice extent changes.
- Research on Anomalous Years: Focus on years with significant anomalies, especially the mid-2010s, by reviewing climatic events, research findings, and climate reports from those periods.
- Model Refinement: Considering non-linear models or machine learning approaches to account for the complex interactions in climate data and improve predictions of sea ice extent.
These steps aim to deepen the understanding of Antarctic sea ice dynamics and the impact of climatic and environmental factors on its variability and long-term trends.
For me, the value of having such a capable and efficient research assistant at my fingertips is undeniable. I was able to interrogate a dataset that I had no real familiarity with, conduct several analyses, hypothesise about the results and suggest future work, all within a single chat session over the course of about an hour. and that's really only still scratching the surface of what's possible. I'm also able to download the full dataset and analyses for my own work, and can even ask it to prepare a presentation for me.
With that thought in mind, I then had ChatGPT write a 10 slide presentation for me, including reproducing the analysis in another chat, outlining and making notes for each slide, generating a title for the presentation and some an image for the title slide
The possibilities are extensive here and span as wide as the human imagination. I could also ask ChatGPT to help me write a podcast script, social media post, exec summary, in-depth report or a lesson plan. It could even turn it into a sci-fi novel, a poem or even a haiku. Actually, let's end on that note.
Comments
Post a Comment