Tokens: Your AI's Memory Budget
đď¸Tokens: Your AI's Memory Budget
When youâre chatting with an AI in Janitor, every word you type, and every word the bot spits back, costs tokens. Think of tokens like little coins you drop into a vending machine: every message you send or receive spends some coins. And the machine? It only has so many coins it can hold before it starts running out of space.
In JLLM, your AIâs token âwalletâ often holds between 8,000 to 9,000 tokens (you can check the exact limits on the Janitor Discord if you want to nerd out). Once you hit that limit, the AI starts âforgettingâ the oldest bits of your conversation to make room for the new stuff. Kind of like your brain after a long day, it canât keep every detail forever.
More Than Just Words
Tokens arenât exactly the same as words. Sometimes one word is one token, sometimes itâs two, and sometimes itâs even less. For example:
- âhelloâ = 1 token
- âcanâtâ = 2 tokens (because of the apostrophe)
- âunbelievableâ = 3 tokens
- Emoji? Yep, usually 1 or 2 token each.
Each model counts tokens a little differently, so JLLM might count your message a bit differently than another LLM may do.
How Text Becomes Tokens (and Back)
Hereâs what happens when you send a message to the AI. First, the text gets broken down into tokens, and each one gets assigned a special ID number. Then the AI converts these token IDs into something called âembeddings,â which are fancy math vectors that help it understand meaning and context. Finally, when the AI replies, it turns its internal tokens back into words and sentences you can read. Think of tokenization as the AIâs secret handshake for talking human.
Raw text is messy and complicated. Tokenizing turns language into neat little chunks so the AI can work faster and smarter without getting overwhelmed. It also helps the AI spot patterns and connections between words, like a detective piecing together clues. Essentially, tokenization translates human language into math, allowing the AI to actually understand what you mean.
Without tokens, your AI would be like a tourist trying to read a menu in an alien language, confused and probably ordering something weird.
Every model learns its own way of chopping up text based on the data it saw during training. This means tokenizers are not just fixed rulebooks but trained systems with their own quirks on splitting words. What looks like one token to you might be two or three for the AI, or the other way around. This is why online token counters are rough estimates. The only true count comes from the AIâs own internal tokenizer, you can create a new character on Janitor to see the accurate token count for JLLM. It's kinda like accents, you and your AI might pronounce a word differently, so you have to check with the AIâs own dictionary to be sure.
Different Ways to Chop Up Text
There are different ways to slice up the text, each with its own pros and cons. Word tokenization splits text by spaces and punctuation. This is easy to understand, but it can create a huge list of words to remember and struggles with unusual spellings or new words. For example, ârunningâ counts as one token, but âran,â âruns,â and ârunnerâ are all separate tokens.
Character tokenization breaks text down into single letters or symbols. This approach uses the smallest vocabulary possible, but it means the AI has to juggle way more tokens, which can slow things down and make it harder for the AI to understand meaning. So, âhelloâ becomes âhâ, âeâ, âlâ, âlâ, âoâ.
Subword tokenization strikes a balance between the two. It keeps common words whole but breaks down rare or tricky words into smaller chunks. Most modern AI models use this method. For example, âannoyinglyâ might be split into âannoyingâ plus âlyâ.
Permanent vs. Temporary Tokens
In JLLM, not all tokens are created equal. Some tokens are like the sturdy foundation of your AIâs personality, they stick around no matter what. Others are more like chalk on a whiteboard: they appear during the conversation but eventually get wiped away to make room for new stuff.
Letâs start with the permanent tokens. These are the pieces that form the backbone of your AI bot. They show up every single time the AI processes a message, no matter how long the chat gets. Permanent tokens include things like Advanced Prompts (APs), Chat Memory (CM), Personality/Prompt and Scenario.
Temporary tokens are for example the messages you and the bot exchange: your words, its replies, and the immediate flow of the chat. Unlike permanent tokens, these get pushed out as new messages come in. Think of it like a whiteboard: you keep writing, and the oldest stuff gets erased to make room.
Hereâs the catch: the more permanent tokens you use, the less space you leave for temporary tokens, the actual chat. So if your botâs permanent setup is huge and complex, you might only have a small token âbudgetâ left for the conversation. Itâs like trying to write a whole novel on a tiny napkin. Eventually, the story falls apart because you run out of room.
Think of your AI as a goldfish with a tiny notepad. You want to jot down just the most important notes that help the conversation, not write a whole encyclopaedia. Keeping permanent info short and sweet makes the chat flow smoother and keeps your AI sharp and focused.
Keeping Bots Lean for Better Performance
Itâs tempting to think that the more lore, backstory, and personality you cram into a bot, the better itâll perform. But hereâs the truth: bots with massive permanent setups donât work better, they just break faster.
Bots run on a limited pool of tokens. Every word you feed into your botâs prompt, personality, scenario, advanced prompts, and chat memory eats into that pool. The more space those permanent pieces take up, the less room you have left for actual conversation. Once that limitâs hit, the AI doesnât stop, it just quietly starts forgetting the oldest parts of the chat to make room for new ones. No warnings, no errors. Just lost names, missing plot threads, and glitchy, repetitive responses.
This kind of silent failure is what trips up a lot of users. A 3,000-token character file might look impressive, but it leaves so little space for active memory that the bot starts stumbling over itself. Itâll forget your setup, confuse relationships, and slowly drift into nonsense. This is why the âgoldfish with a notepadâ metaphor works so well. The AI isnât a librarian that stores everything youâve ever told it. It needs quick, useful notes, not a wiki article.
Good bot design is all about cutting the fat. If your pirate doesnât need a 500-word childhood tale to flirt at the tavern, donât include it. Focus on what actually helps the AI do its job. Summarize instead of archiving. Donât copy-paste the same info into multiple fields. Think like a screenwriter, not a historian: what does the bot need to know right now to act right?
And this goes for you as the user, too. If youâre deep into a roleplay and something important happened a dozen messages ago, you might need to remind the AI about it. Just give it a quick recap in the Chat Memory(CM). You're the botâs short-term memory assistant, its keeper of context! Because when the AI forgets, it's not being lazy. It just ran out of space.
If things still go sideways and your botâs repeating itself or forgetting key points, donât waste time trying to patch it up. Just start a new chat and start transplanting. Bring only the essentials with you: the core setup and any major developments the bot needs to remember. Clean slates work wonders.
Bottom line: clear, focused setups lead to better, longer-lasting chats. Most well-made bots perform best with under 1,500 tokens of permanent info. Push past 2,000, and youâre already skating on thin ice. When in doubt, less is almost always more. Senior
Next Up: What Makes A Good Persona?
Updated on: 03/08/2025
Thank you!