Ever stared blankly at a search bar, trying to remember the right keyword to find that perfect travel backpack with USB charging, waterproof lining, and secret compartments?
Yeah, same.
That's why at Nobs, we built an AI that doesn't just search โ it understands. Here's how we trained our bot to go from "just keywords" to "shopping buddy with brains."
๐ง Step 1: Give the Bot Some Context
Product listings are messy โ you've got names, prices, specs, and descriptions. We mashed all the good stuff together into one rich, contextual string
def combine_attributes(row):
attributes = [row['Name'], row['Category'], row.get('SellingPrice', ''),
row['Description'], row['Specification']]
attributes = [str(attr) for attr in attributes if pd.notna(attr) and attr != ""]
return " ".join(attributes)Now, instead of just searching by product name, the AI learns from the entire product story.
๐ฆ Step 2: Teach It to Remember
We used OpenAI's text-embedding-ada-002 model to turn each product into a 1536-dimension vector. Think of it like compressing each product into a "meaningful fingerprint" of its essence
response = openai.Embedding.create(
model="text-embedding-ada-002",
input=chunk
)
embeddings.extend([result["embedding"] for result in response["data"]]We cached this to avoid burning through credits every time someone searched for "Bluetooth earphones under 2k with mic."
๐ Step 3: Understand Queries Like a Human
When a user types "Need something rugged and waterproof for mountain hiking," most search engines throw up their hands. Ours doesn't.
We built a little GPT-based assistant to refine and extract meaningful keywords from such natural queries
response = openai.ChatCompletion.create(
model="gpt-3.5-turbo",
messages=[
{"role": "system", "content": "Extract key search terms for e-commerce."},
{"role": "user", "content": f"{query}"}
]
)
enhanced_query = response.choices[0].message.content.strip()The AI essentially rephrases vague questions into sharp, searchable terms.
๐ Step 4: Match Like a Pro
We match the user query embedding against all product embeddings using good ol' cosine similarity
cosine_similarity([query_embedding], chunk_embeddings).flatten()Then, we sort the top results โ and boom, relevant recommendations appear like magic โจ
๐งช Bonus: Caching, Chunking & Smart Retries
We wrapped the heavy-lifting in a retry loop to survive API hiccups, used chunking to dodge rate limits, and pickle-dumped embeddings to disk for performance:pythonCopyEdi
try:
with open(embeddings_file, 'rb') as f:
product_embeddings = pickle.load(f)
except:
product_embeddings = generate_embeddings_in_chunks(product_descriptions)No drama, no downtime โ just AI doing what it does best.
๐ Final Thoughts
We didn't reinvent search. We just made it smarter. More human. More helpful. That's the Nobs way โ no BS, just results.
Got a messy product catalog or confusing customer queries? Let's talk.
Have questions? Reach out directly at allvaluenobs@gmail.com โ we're here to help you get the most from your data.
NoBS is a data science and machine learning company leveraging smart, affordable talent to deliver real impact without inflated costs. We focus on outcomes, not jargon โ if we don't deliver value, you don't pay. Specializing in practical, scalable solutions for small and mid-sized companies, we use open-source tools and a no-nonsense approach to keep things simple and cost-effective.
With over 100 solutions built across industries, we're honest, affordable, and focused on results that truly matter.
๐ Website: www.no-bs.in