Tokens: Your AI's Memory Budget

🎟️Tokens: Your AI's Memory Budget

When you’re chatting with an AI in Janitor, every word you type, and every word the bot spits back, costs tokens. Think of tokens like little coins you drop into a vending machine: every message you send or receive spends some coins. And the machine? It only has so many coins it can hold before it starts running out of space.

In JLLM, your AI’s token “wallet” often holds between 8,000 to 9,000 tokens (you can check the exact limits on the Janitor Discord if you want to nerd out). Once you hit that limit, the AI starts “forgetting” the oldest bits of your conversation to make room for the new stuff. Kind of like your brain after a long day, it can’t keep every detail forever.

More Than Just Words

Tokens aren’t exactly the same as words. Sometimes one word is one token, sometimes it’s two, and sometimes it’s even less. For example:

“hello” = 1 token
“can’t” = 2 tokens (because of the apostrophe)
“unbelievable” = 3 tokens
Emoji? Yep, usually 1 or 2 token each.

Each model counts tokens a little differently, so JLLM might count your message a bit differently than another LLM may do.

How Text Becomes Tokens (and Back)

Here’s what happens when you send a message to the AI. First, the text gets broken down into tokens, and each one gets assigned a special ID number. Then the AI converts these token IDs into something called “embeddings,” which are fancy math vectors that help it understand meaning and context. Finally, when the AI replies, it turns its internal tokens back into words and sentences you can read. Think of tokenization as the AI’s secret handshake for talking human.

Raw text is messy and complicated. Tokenizing turns language into neat little chunks so the AI can work faster and smarter without getting overwhelmed. It also helps the AI spot patterns and connections between words, like a detective piecing together clues. Essentially, tokenization translates human language into math, allowing the AI to actually understand what you mean.

Without tokens, your AI would be like a tourist trying to read a menu in an alien language, confused and probably ordering something weird.

Every model learns its own way of chopping up text based on the data it saw during training. This means tokenizers are not just fixed rulebooks but trained systems with their own quirks on splitting words. What looks like one token to you might be two or three for the AI, or the other way around. This is why online token counters are rough estimates. The only true count comes from the AI’s own internal tokenizer, you can create a new character on Janitor to see the accurate token count for JLLM. It's kinda like accents, you and your AI might pronounce a word differently, so you have to check with the AI’s own dictionary to be sure.

Different Ways to Chop Up Text

There are different ways to slice up the text, each with its own pros and cons. Word tokenization splits text by spaces and punctuation. This is easy to understand, but it can create a huge list of words to remember and struggles with unusual spellings or new words. For example, “running” counts as one token, but “ran,” “runs,” and “runner” are all separate tokens.

Character tokenization breaks text down into single letters or symbols. This approach uses the smallest vocabulary possible, but it means the AI has to juggle way more tokens, which can slow things down and make it harder for the AI to understand meaning. So, “hello” becomes ‘h’, ‘e’, ‘l’, ‘l’, ‘o’.

Subword tokenization strikes a balance between the two. It keeps common words whole but breaks down rare or tricky words into smaller chunks. Most modern AI models use this method. For example, “annoyingly” might be split into “annoying” plus “ly”.

Permanent vs. Temporary Tokens

In JLLM, not all tokens are created equal. Some tokens are like the sturdy foundation of your AI’s personality, they stick around no matter what. Others are more like chalk on a whiteboard: they appear during the conversation but eventually get wiped away to make room for new stuff.

Let’s start with the permanent tokens. These are the pieces that form the backbone of your AI bot. They show up every single time the AI processes a message, no matter how long the chat gets. Permanent tokens include things like Advanced Prompts (APs), Chat Memory (CM), Personality/Prompt and Scenario.

Temporary tokens are for example the messages you and the bot exchange: your words, its replies, and the immediate flow of the chat. Unlike permanent tokens, these get pushed out as new messages come in. Think of it like a whiteboard: you keep writing, and the oldest stuff gets erased to make room.

Here’s the catch: the more permanent tokens you use, the less space you leave for temporary tokens, the actual chat. So if your bot’s permanent setup is huge and complex, you might only have a small token “budget” left for the conversation. It’s like trying to write a whole novel on a tiny napkin. Eventually, the story falls apart because you run out of room.

Think of your AI as a goldfish with a tiny notepad. You want to jot down just the most important notes that help the conversation, not write a whole encyclopaedia. Keeping permanent info short and sweet makes the chat flow smoother and keeps your AI sharp and focused.

Keeping Bots Lean for Better Performance

It’s tempting to think that the more lore, backstory, and personality you cram into a bot, the better it’ll perform. But here’s the truth: bots with massive permanent setups don’t work better, they just break faster.

Bots run on a limited pool of tokens. Every word you feed into your bot’s prompt, personality, scenario, advanced prompts, and chat memory eats into that pool. The more space those permanent pieces take up, the less room you have left for actual conversation. Once that limit’s hit, the AI doesn’t stop, it just quietly starts forgetting the oldest parts of the chat to make room for new ones. No warnings, no errors. Just lost names, missing plot threads, and glitchy, repetitive responses.

This kind of silent failure is what trips up a lot of users. A 3,000-token character file might look impressive, but it leaves so little space for active memory that the bot starts stumbling over itself. It’ll forget your setup, confuse relationships, and slowly drift into nonsense. This is why the “goldfish with a notepad” metaphor works so well. The AI isn’t a librarian that stores everything you’ve ever told it. It needs quick, useful notes, not a wiki article.

Good bot design is all about cutting the fat. If your pirate doesn’t need a 500-word childhood tale to flirt at the tavern, don’t include it. Focus on what actually helps the AI do its job. Summarize instead of archiving. Don’t copy-paste the same info into multiple fields. Think like a screenwriter, not a historian: what does the bot need to know right now to act right?

And this goes for you as the user, too. If you’re deep into a roleplay and something important happened a dozen messages ago, you might need to remind the AI about it. Just give it a quick recap in the Chat Memory(CM). You're the bot’s short-term memory assistant, its keeper of context! Because when the AI forgets, it's not being lazy. It just ran out of space.

If things still go sideways and your bot’s repeating itself or forgetting key points, don’t waste time trying to patch it up. Just start a new chat and start transplanting. Bring only the essentials with you: the core setup and any major developments the bot needs to remember. Clean slates work wonders.

Bottom line: clear, focused setups lead to better, longer-lasting chats. Most well-made bots perform best with under 1,500 tokens of permanent info. Push past 2,000, and you’re already skating on thin ice. When in doubt, less is almost always more. Senior

Next Up: What Makes A Good Persona?

Updated on: 03/08/2025

Was this article helpful?

Thank you!