IEEE AESS - Part I: LLM Basics

#LLM #WIE
Share

Garrett Hall, a Research Engineer at the Southwest Research Institute, will deliver an introductory presentation on Large Language Models. This talk is the first in a two-part series and covers several fundamental concepts, including tokenization, vector embeddings, and positional encoding.

Tokenization is the process of converting words or phrases into numerical values that machine learning models can understand. By breaking down text into smaller units called tokens, the model can more effectively process and analyze the data.

Vector embeddings are a crucial next step, where these tokens are transformed into dense vector representations. These vectors capture semantic meaning, enabling the model to understand relationships between words based on their contextual usage. Embeddings essentially map tokens into high-dimensional space where similar words are located closer together.

Positional encoding provides additional information about the order of the words in a sentence, establishing a foundation for sentence structure. It embeds positional information within the tokenized data so that the model can recognize the sequence and context of words, which is essential for understanding the meaning of the text as a whole.

Finally, the presentation will illustrate Retrieval-Augmented Generation (RAG) processes. RAG combines retrieval-based and generative models to enhance the generation of relevant and accurate text by incorporating external information sources. This section will demonstrate how the preceding concepts of tokenization, embeddings, and positional encoding come together in RAG to create more coherent and contextually appropriate text.

 

Cookies and refreshments will be served.

Talk is restricted to US citizens.

Registration required by COB Monday 3/10 for admittance to SwRI grounds on day of event.



  Date and Time

  Location

  Hosts

  Registration



  • Date: 12 Mar 2025
  • Time: 12:00 PM to 01:00 PM
  • All times are (UTC-05:00) Central Time (US & Canada)
  • Add_To_Calendar_icon Add Event to Calendar
  • 6220 Culebra Rd
  • San Antonio, Texas
  • United States 78238
  • Building: Building 51

  • Contact Event Hosts
  • Starts 04 March 2025 12:00 AM
  • Ends 10 March 2025 05:00 PM
  • All times are (UTC-05:00) Central Time (US & Canada)
  • No Admission Charge


  Speakers

Garrett

Topic:

LLM Part I

Garrett Hall, a Research Engineer at the Southwest Research Institute, will deliver an introductory presentation on Large Language Models. This talk is the first in a two-part series and covers several fundamental concepts, including tokenization, vector embeddings, and positional encoding.

Tokenization is the process of converting words or phrases into numerical values that machine learning models can understand. By breaking down text into smaller units called tokens, the model can more effectively process and analyze the data.

Vector embeddings are a crucial next step, where these tokens are transformed into dense vector representations. These vectors capture semantic meaning, enabling the model to understand relationships between words based on their contextual usage. Embeddings essentially map tokens into high-dimensional space where similar words are located closer together.

Positional encoding provides additional information about the order of the words in a sentence, establishing a foundation for sentence structure. It embeds positional information within the tokenized data so that the model can recognize the sequence and context of words, which is essential for understanding the meaning of the text as a whole.

Finally, the presentation will illustrate Retrieval-Augmented Generation (RAG) processes. RAG combines retrieval-based and generative models to enhance the generation of relevant and accurate text by incorporating external information sources. This section will demonstrate how the preceding concepts of tokenization, embeddings, and positional encoding come together in RAG to create more coherent and contextually appropriate text.

Biography:

Mr. Hall has worked at Southwest Research Institute since 2019 in the Defense and Intelligence Solutions Division. His AI based research focuses deployable AI models for the purpose of signal classification. Other AI research efforts include Large Language Models and multi-agent reinforcement learning. He holds a masters degree from The University of Texas at San Antonio in Electrical Engineering. Currently, Mr. Hall serves as the Chair of the IEEE Lone Star Section's AESS/SMC/SYSC Joint Chapter.

Address:Texas, United States