Detecting fraud rings: the social-graph problem in disguise

Fraud at scale usually gives itself away through repetition. One phone number behind twelve identities. Same device fingerprint reused across signups. A shipping address that keeps showing up on accounts that supposedly have nothing to do with each other.

At some point the accounts stop looking independent. They're a ring.

Finding rings is a graph problem. Nodes are accounts, edges are shared attributes, and what you want is connected components.

The usual recommendation is to throw the data into Neo4j and solve it there with Cypher queries. That absolutely works. It's also a lot of infrastructure for something your warehouse can often handle directly.

This post is how to do it with regular SQL.

If you work on AML or transaction monitoring, the same shape applies. KYC profile overlaps are graph problems too, just with different attribute labels on the nodes.

What a fraud ring looks like

Simple example:

Account A and account B share a phone number at signup.
Account B and account C share a shipping address.
Account C and account D share a device fingerprint.
Account E and account A share an IP at first transaction.

A, B, C, D, and E are connected. Looked at individually they look like five separate customers. Together they're almost always one operator running cards or laundering accounts.

Single shared attributes catch the obvious cases. The harder problem is following those connections outward and figuring out which accounts belong to the same cluster.

The data

Assume you have an accounts table and a shared_attributes table:

-- accounts
account_id | signup_date | status
-----------|-------------|--------
1          | 2026-01-01  | active
2          | 2026-01-02  | active
...

-- shared_attributes
account_a | account_b | attribute_type | attribute_value
----------|-----------|----------------|----------------
1         | 7         | phone          | +1-555-0100
2         | 14        | shipping_address | 123 Main St
...

The shared_attributes table has one row per pair of accounts that share at least one identifying attribute. The exact build step depends on what identifiers you have available. The recursive part below assumes it exists.

If you don't have it pre-built, you can build it inline:

SELECT
  a1.account_id AS account_a,
  a2.account_id AS account_b,
  'phone' AS attribute_type,
  a1.phone AS attribute_value
FROM accounts a1
JOIN accounts a2
  ON a1.phone = a2.phone
 AND a1.account_id < a2.account_id;

The account_id < account_id predicate dedupes so you don't get both A-B and B-A. Repeat the shape for each attribute and UNION ALL the results. Materialize it as a table if it gets large.

Finding pairs (the easy part)

Pairs are just a SELECT against shared_attributes.

SELECT
  account_a,
  account_b,
  count(*) AS shared_attribute_count
FROM shared_attributes
GROUP BY 1, 2
HAVING count(*) >= 2
ORDER BY shared_attribute_count DESC;

Two shared attributes are usually a much stronger signal than one. Shared phone by itself might just be roommates or family. Shared phone plus device plus address is where things start getting interesting fast.

Finding clusters (the recursive part)

For clusters of arbitrary size you need a recursive CTE. Start from an account, follow every edge to find accounts connected to it, then follow every edge from those, and so on, until no new accounts get added.

WITH RECURSIVE ring AS (
  -- Seed: pick a starting account
  SELECT
    account_a AS member,
    1 AS depth
  FROM shared_attributes
  WHERE account_a = 1

  UNION

  -- Step: add accounts connected to any current ring member
  SELECT
    CASE
      WHEN s.account_a = r.member THEN s.account_b
      ELSE s.account_a
    END AS member,
    r.depth + 1
  FROM ring r
  JOIN shared_attributes s
    ON r.member IN (s.account_a, s.account_b)
  WHERE r.depth < 10
)
SELECT DISTINCT member FROM ring;

One easy mistake here is using UNION ALL instead of UNION.

If A connects to B and B connects back to A, the recursion keeps walking the same loop forever unless duplicates get removed during each pass. Regular UNION handles that automatically.

In practice most of the clusters I've seen are relatively small, so a depth cap like 10 is usually enough to keep runaway recursion under control. The cap is also there for the rare malformed shared_attributes table that would otherwise blow up the query.

A note on platform support

Recursive CTEs work in Postgres, Snowflake, BigQuery, SQL Server, MySQL 8+, and SQLite. The syntax above is close to ANSI standard SQL and runs in all of them with minor adjustments. Postgres tends to be the cleanest. Snowflake and BigQuery both handle it but you'll see different optimizer behavior.

BigQuery's GoogleSQL graph queries went GA in 2025. If you're already on BigQuery and your rings are large enough that the recursive CTE feels slow, the native graph syntax is the cleaner path before you reach for an external graph database.

Finding all rings (not just one seed)

The query above starts from one account. To enumerate every distinct ring across your whole population you need a different approach: assign every account to a connected component.

WITH RECURSIVE component AS (
  SELECT
    account_a AS member,
    least(account_a, account_b) AS component_id
  FROM shared_attributes

  UNION

  SELECT
    s.account_b,
    c.component_id
  FROM component c
  JOIN shared_attributes s
    ON s.account_a = c.member
  WHERE NOT EXISTS (
    SELECT 1
    FROM component c2
    WHERE c2.member = s.account_b
      AND c2.component_id <= c.component_id
  )
)
SELECT component_id, count(DISTINCT member) AS ring_size
FROM component
GROUP BY 1
HAVING count(DISTINCT member) >= 3
ORDER BY ring_size DESC;

least(account_a, account_b) AS component_id picks the lowest account_id in any pair as the canonical name for the component. The NOT EXISTS check stops members from getting re-added under a higher component_id.

This version gets expensive a lot faster than the seeded query, so it's usually something you'd run as a scheduled batch process instead of inline with analyst queries.

Where this hits false positives

Not every shared attribute means fraud. The signals that get noisy in practice:

Apartment complexes and dorms. Lots of unrelated accounts share an address.
Household phone plans. Family members share the primary line.
Employer-provided devices, IPs, and mailing addresses. Corporate accounts cluster.
Mailbox stores and PO box services. Legitimate small businesses use these too.

The fix isn't to drop those signals entirely. It's to weight them. A shared apartment address by itself is weak. Shared apartment address plus shared device plus shared payment method is still strong even if any one signal is noisy on its own. The component rollup naturally surfaces the dense clusters where multiple weak signals coincide.

When SQL stops being enough

The recursive CTE handles components up to a few thousand members fine on a modern warehouse. Past that, performance degrades because every iteration is a full self-join.

You've outgrown the SQL approach when your largest component hits the tens of thousands of accounts, or when the recursion depth caps you've set are getting hit on the regular. Either of those is your cue to move to a purpose-built graph database (Neo4j, Memgraph, TigerGraph) or a graph extension on your warehouse. The SQL pattern still has a job to do as the feature-engineering layer feeding the graph engine.

Worth flagging since the question comes up: ring detection is inherently batch. It's a complement to your real-time transaction checks, not a replacement. The realtime layer catches the immediate suspicious activity. The ring layer flags accounts to watch even when nothing's currently tripping a realtime rule.

Until you hit that scale, the recursive CTE does the job without making you stand up new infrastructure.

What I'd actually do in a new system

In order:

Build the shared_attributes table from the obvious signals: phone, address, device, IP, payment method.
Run the pair-detection query weekly. Triage the strongest pairs (3+ shared attributes).
Once the pair workflow has a confirmed-fraud feedback loop running, build the recursive CTE for ring detection. Run monthly.
Save the rings as features that feed per-transaction scoring. A transaction from an account in a known ring is automatically high-risk.

The mistake I see new teams make is jumping straight to step 3 and skipping step 2. The recursive query is more fun to write but it's useless without the pair workflow underneath it. No feedback loop means no way to tell which detected rings are real and which are coincidence.

Next post: why your fraud alerts are noisy, and what to actually do about it. Most teams reach for more rules. The real fix is in the feedback loop.

A note on process: AI helps with editing, not engineering. The SQL and patterns are from years of program-integrity work; the model just cuts the rambling.

Fixel Smith is an experienced Program Integrity Analyst working in public-sector data.