Thread
GPT Index can help simulate Ctrl+F lookup in an external data source. For instance, the ReAct paper (@ShunyuYao12) does ctrl+f lookups in Wikipedia.



We provide some examples of how to do this in GPT Index + analysis of tradeoffs/alternative data structures to try out 🧵
GPT Index simulates Ctrl+F by simply 1) creating a list data structure from a data source, and 2) adding a keyword filter during query time.



With 2), GPT Index only processes text chunks containing a keyword, rather than iteratively through every chunk.
Here’s an example with our Wikipedia🌎 data connector.
We load a page on “Covid-19”, ask it a question (“Which country included tocilizumab in treatment for covid-19?”), and “ctrl-F” for “tocilizumab” to find our answer.

We get the right answer ("China" 🖼️ )
Ctrl-F is a very simple and powerful heuristic for finding relevant information.

*However*, it doesn’t always work! Sometimes there isn’t an obvious keyword to query. Other times the keyword leads to false positives.
For instance, in @paulg’s “What I Worked On”, the phrase “Y Combinator” is sprinkled throughout the entire essay.
Using that phrase as a keyword lookup is ineffective in answering “What did the author do after his time at Y Combinator?”
(the answer dumps the entire essay 🖼️)
Alternative data structures can return better results. Our keyword tables, which use GPT to extract keywords ONLY IF they’re relevant to the text chunk, returns a better answer 🖼️
GPT Index github.com/jerryjliu/gpt_index is constantly iterating on new ways to organize and query information 📁

What are other ways besides “Ctrl-F” that we as humans try to lookup and identify answers to questions? Let us know!
Mentions
See All