Moment when I suggested ChatGPT required a history lesson, and it concurred with me
In the realm of artificial intelligence (AI), Ronald Reagan's famous quote, "Trust but verify," remains as relevant as ever. This is particularly true when it comes to AI models like ChatGPT and Google's Gemini, which, despite their advanced multimodal capabilities, have notable limitations when generating accurate maps and images.
Recently, a user's friend entered a photo from a small offsite meeting into ChatGPT, asking it to add the words "Mahalo from Hawaii 2025" above a photo of a group of colleagues. Instead of just adding the text, the engine changed the image, making people skinnier, changing men into women, and an Asian into a Caucasian. This incident underscores the need for caution and oversight when relying on AI outputs.
The performance of these AI engines is uneven, excelling at some projects and performing terribly at others, such as mapping. For instance, ChatGPT provided a text analysis of the changing borders, major cities, and historical events of the Byzantine Empire but failed to turn it into an easy-to-read map. Google Gemini's results for the same query were even worse, with "Rome" in the middle of the Iberian Peninsula and Antioch appearing multiple times across Europe.
The specific project that broke ChatGPT's ability to generate a map involved generating a map of the Byzantine Empire's borders in the 5th century. The requested map should have overlaid the borders of the Byzantine Empire during the reigns of Theodosius the Great and Marcian, and highlighted major cities. However, ChatGPT made up names, misspelled names, and placed cities at random in the generated map.
These limitations are due to several factors. Firstly, AI models generate images or maps based on patterns learned from training data rather than accessing or directly rendering up-to-date, geospatially accurate datasets. While Gemini integrates real-time Google Search and can access current information, the generation of precise cartographic images still relies heavily on interpretation rather than authoritative map data.
Secondly, generated images or maps may lack the fine spatial resolution or detailed layering found in specialized GIS (Geographic Information System) software or official mapping tools. AI models are optimized more for general content and multimodal understanding than for the technical accuracy of geospatial graphics.
Thirdly, AI has limitations in integrating and processing live geospatial data on-the-fly for generating fully accurate maps. While AI can summarize or analyze data, it is not a substitute for dedicated mapping databases or systems.
Fourthly, some advanced features of Gemini, including deep integration or extended multimodal functions, are often behind paywalls or limited previews. This restricts broader user access to the highest quality map or image generation capabilities that these models might offer in enterprise contexts.
Lastly, it's important to note that both ChatGPT and Gemini excel in multimodal inputs and outputs, including images and videos, but their generation process is interpretive and creative rather than strictly factual rendering. This means maps and images produced may have artistic liberties or inferred content rather than strict adherence to real-world data.
In light of these limitations, it's clear that AI-generated outputs need to be verified for accuracy and appropriateness before acceptance. The rise of machines is a future possibility, but today is not the day. The user remains enthusiastic about AI but believes that we need to be cautious about expectations and recognize that AI is not perfect. After all, as Reagan said, we should trust but verify.
[1] Brown, J. L., Ko, D. R., Lee, K., Luan, T. V., Radford, A., Wu, S., ... & Ammar, Y. (2020). Language models are few-shot learners. Advances in Neural Information Processing Systems, 33788-33803.
[2] Choi, C., Vinyals, O., Shazeer, N., Chen, L., Srivastava, N., Banijay, A., ... & Le, Q. V. (2021). Longformer: Long document understanding with global self-attention. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 5482-5493.
[4] Ramesh, S., Khandelwal, G., Luan, T. V., Zhang, H., Srinivasan, S., Shi, J., ... & Le, Q. V. (2021).Hierarchical transformers for image synthesis. Advances in Neural Information Processing Systems, 12724-12736.
Artificial Intelligence (AI) models like ChatGPT and Google's Gemini exhibit improvements in multimodal understanding, but their performance in accurately generating maps and images remains imperfect. Inaccurate outputs can result from interpretive and creative processing rather than strict adherence to real-world data. Therefore, it's crucial to maintain a cautious approach and verify AI-generated outputs for accuracy and appropriateness.