ChatGPT for GTFS: benchmarking LLMs on GTFS semantics... and retrieval | Public Transport Skip to main content
Log in

ChatGPT for GTFS: benchmarking LLMs on GTFS semantics... and retrieval

  • Original Research
  • Published:
Public Transport Aims and scope Submit manuscript

Abstract

The General Transit Feed Specification (GTFS) standard for publishing transit data is ubiquitous. With the advent of LLMs being used widely, this research explores the possibility of extracting transit information from GTFS through natural language instructions. To evaluate the capabilities and limitations of LLMs, we introduce two benchmarks, namely “GTFS Semantics” and “GTFS Retrieval” that test how well LLMs can “understand” GTFS standards and retrieve relevant transit information. We benchmark OpenAI’s GPT-3.5 Turbo and GPT-4 LLMs, which are backends for the ChatGPT interface. In particular, we use zero-shot, one-shot, chain of thought, and program synthesis techniques with prompt engineering. For our multiple questions, GPT-3.5 Turbo answers 59.7% correctly and GPT-4 answers 73.3% correctly, but they do worse when one of the multiple choice options is replaced by “None of these”. Furthermore, we evaluate how well the LLMs can extract information from a filtered GTFS feed containing four bus routes from the Chicago Transit Authority. Program synthesis techniques outperformed zero-shot approaches, achieving up to 93% (90%) accuracy for simple queries and 61% (41%) for complex ones using GPT-4 (GPT-3.5 Turbo).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Data Availability

Both ‘GTFS Semantics’ and ‘GTFS Retrieval’ benchmarks along with the filtered GTFS data and questionnaire used in this paper are available at https://github.com/UTEL-UIUC/GTFS_LLM.

Notes

  1. Note g is not a mathematical or computer function, but rather stands in for the grading process. Different types of answers are graded in different ways. Multiple-choice answers are graded automatically by a script that carries out string-matching, but the program synthesis questions require copying the code to run in a Python terminal.

  2. Example Q &A available at https://platform.openai.com/examples/default-qa [Accessed 2023-07-29]

  3. OpenAI Chat Completions API: https://platform.openai.com/docs/api-reference/completions/create [Accessed 2023-07-29]

  4. Open AI Python Library https://github.com/openai/openai-python

  5. The CTA feed is available for download at https://transitfeeds.com/p/chicago-transit-authority/165/20230503

References

Download references

Author information

Authors and Affiliations

Authors

Contributions

The authors confirm their contribution to the paper as follows: study conception and design: SD; data collection: SD, SQ; analysis and interpretation of results: SD, SQ, LL; draft manuscript preparation: SD, SQ, LL. All authors reviewed the results and approved the final version of the manuscript.

Corresponding author

Correspondence to Saipraneeth Devunuri.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Appendix

See Figs. 6, 7, 8.

Fig. 6
figure 6

GPT-3.5 Turbo makes mistakes with ‘Categorical Mapping’. The difference between ZS and CoT is the single instruction change highlighted in the system prompt. The ‘route_type’ for the bus is ‘3’ according to the GTFS documentaion

Fig. 7
figure 7

Example Prompt and Response for Zero-shot GTFS Information Retrieval. The ‘Example User’ and ‘Example Assistant’ are used to imitate a conversation. The response is generated by the GPT-3.5 Turbo model

Fig. 8
figure 8

Some examples where GPT-4 makes mistakes with GTFS Information Retrieval. The “System” prompt is the same as the one in Fig. 2

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Devunuri, S., Qiam, S. & Lehe, L.J. ChatGPT for GTFS: benchmarking LLMs on GTFS semantics... and retrieval. Public Transp 16, 333–357 (2024). https://doi.org/10.1007/s12469-024-00354-x

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12469-024-00354-x

Keywords

Navigation