When ORMs Fail Silently: A Tale of Prisma, PostgreSQL, and Chinese Text

Sometimes the most insidious bugs are the ones that don't throw errors. They just quietly return the wrong results, leaving you to wonder if your data disappeared or your logic is broken. I recently encountered one of these silent failures while developing a Chinese language learning app (see Chinese Learning App), and it taught me some important lessons about ORM abstractions and Unicode handling.

The Context

I was building a Chinese language learning app where users could search for example sentences containing specific Chinese words. This involved working with CJK text (an acronym for Chinese, Japanese, and Korean languages). These languages share several characteristics that make them different from Latin-based languages:

No concept of upper/lowercase: Unlike English where "A" and "a" are different cases of the same letter, Chinese characters like 中 don't have case variations
Complex Unicode representation: The same character might have multiple Unicode codepoints (traditional vs simplified Chinese, different Japanese writing systems, etc.)
Different text processing rules: Many text operations designed for Latin scripts behave differently or unexpectedly with CJK text

The Problem

I was querying example sentences for Chinese words in my app. When a user searched for a word like "奥" (meaning "mysterious","Austrian" or related to "Olympics"), the endpoint should have returned sentences containing that character. Instead, it consistently returned zero results. The frustrating part? I could run the exact same query directly in psql and get the expected matches.

-- This worked fine in psql
SELECT * FROM sentences WHERE content LIKE '%奥%';

But the equivalent query in my app using Prisma returned nothing:

const sentences = await prisma.sentence.findMany({
  where: {
    content: {
      contains: searchTerm // User searched for '奥'
    }
  }
});
// Returns: [] - no results!

The data was definitely there. The query logic looked correct. Everything seemed fine.

The First Layer: ILIKE and Locale-Aware Collation

I enabled query logging on PostgreSQL to see what Prisma was actually sending to the database. The culprit became clear:

-- What Prisma generates
SELECT * FROM sentences WHERE content ILIKE '%奥%';

Prisma's contains filter generates ILIKE (case insensitive LIKE) instead of LIKE. This makes perfect sense for Latin text where you want to match "Hello" with "hello". But here's the problem: PostgreSQL's ILIKE uses locale-aware case folding.

For Chinese characters (and CJK text in general), there's no concept of upper and lower case. When PostgreSQL's collation rules encounter Chinese text in an ILIKE operation, they fall back to an exact match rather than a substring match. No error, no warning, it just silently uses a different behavior that breaks the search functionality in my app.

The Workaround Attempt

My first instinct was to use Prisma's raw query feature:

const sentences = await prisma.$queryRaw`
  SELECT * FROM sentences WHERE content LIKE ${`%${searchTerm}%`}
`;

But I hit another wall. I was using Prisma's @prisma/adapter-pg driver adapter for better performance, and it doesn't support $queryRaw at all, it throws a runtime error.

The solution? Bypass Prisma entirely for this query:

import { Pool } from 'pg';

const pool = new Pool({ connectionString: DATABASE_URL });
const result = await pool.query(
  'SELECT * FROM sentences WHERE content LIKE $1',
  [`%${searchTerm}%`]
);

The Second Layer: Unicode Equivalence

But wait, there's more. Even with the correct LIKE query, I was still getting zero results for some searches. The issue? Unicode equivalence and the complexity of Chinese text.

Here's what was happening: users of my Chinese learning app might search using traditional Chinese characters, but my dataset stored sentences in simplified Chinese. For example:

User searches for: 奧 (traditional Chinese, U+5967)
Database contains: 奥 (simplified Chinese, U+5965)

These characters look nearly identical and represent the same concept, but they're completely different Unicode codepoints. Even a correct substring search won't match them.

This is a common challenge when building Chinese language applications, you need to handle both traditional Chinese (used in Taiwan, Hong Kong, Macau) and simplified Chinese (used in mainland China).

Fortunately, my dataset had both simplified and traditional columns:

-- The final working query
SELECT * FROM sentences 
WHERE simplified LIKE $1 OR traditional LIKE $1;

The Lessons

1. ORM Abstractions Can Silently Fail

ORMs are powerful, but they make assumptions about your data and use cases. Prisma's decision to use ILIKE is sensible for most Western applications, but it breaks down with CJK text. The dangerous part is that you wrong results without any errors ever appearing.

When debugging ORM queries, always check what SQL is actually being generated. Enable query logging and compare the ORM's output with your expectations.

2. Unicode Equivalence Is Complex

Don't assume that visually similar characters are equivalent in Unicode. This is especially critical when building Chinese language applications. Key challenges include:

Traditional vs Simplified Chinese: The same word can have different Unicode codepoints (奧 vs 奥)
Regional variations: Characters may be displayed differently across regions
Input methods: Users might type using different character sets depending on their location or preferences

If you're building a Chinese learning app or any application that deals with Chinese text, plan for these differences from the start. Consider:

Storing multiple variants of the same text
Using specialized search libraries that handle Unicode normalization
Implementing fallback search strategies

3. Know Your Tools' Limitations

I discovered that @prisma/adapter-pg doesn't support $queryRaw, forcing me to drop down to raw SQL. This isn't necessarily bad sometimes the lower-level tool is the right choice but it's important to understand these limitations before they become blockers.

The next time your queries return unexpected results, remember: sometimes the absence of an error is the error itself.