What is perceptual hashing (pHash) for images?

Perceptual hashing converts an image into a short fingerprint based on its visual content, not its bytes. Two images that look nearly identical get similar hashes. Comparing hashes is fast and works even if one copy was resaved, lightly compressed, or slightly resized.

Does refern send my images to a server to find duplicates?

No. Duplicate detection in refern is entirely local. The pHash is computed when your image is indexed and stored in the local SQLite database. No image data leaves your device at any point.

Can refern find duplicates that are slightly different, such as a cropped or resaved copy?

Yes. Perceptual hashing tolerates minor differences like JPEG recompression, slight resizing, or small crops. Images that are visually near-identical but not byte-for-byte identical will still match. Very heavy edits, strong filters, or significant crops will not match.

PureRef, BeeRef, and Allusion all have a duplicate detection feature, right?

None of the three have duplicate detection. PureRef has no search or metadata system at all. BeeRef has no search. Allusion has tag-based filtering but no pHash or visual similarity. refern is the only one of these tools that ships duplicate detection.

How large a library can refern's duplicate scan handle?

The pHash for each image is stored at index time, so the is:duplicate search is a fast database query, not a real-time scan. A user with 27,000 images confirmed smooth performance. The pipeline is designed to scale well beyond that.

Find Duplicate Images in Your Library (2026 Guide)

By refern. Last updated: June 2026.

The short answer: type is:duplicate in refern's search bar and you get a list of all near-duplicate pairs in your library, detected by perceptual hashing on your local machine. No cloud, no upload, no waiting. From there you review each pair and delete the copies you do not want.

The rest of this guide walks through exactly how to do that, why duplicates accumulate in the first place, and what to do when the results need a closer look.

Why reference libraries fill up with duplicates

If you have been collecting references for more than a few months, you almost certainly have duplicates. They get in through several routes:

You save the same image twice from the browser because you forgot you already had it.
You import a folder that overlaps with one you imported before.
You drag an image out of a canvas or project folder into your main library, not realizing you already collected the original.
A site serves the same image at two different URLs (different crop, different resolution, same visual content) and you saved both.
You downloaded a JPEG and a PNG of the same illustration from different sources.

None of these are failures of discipline. They are just what happens when a library grows organically. The problem is that once you have several thousand images, spotting a duplicate by eye is not practical.

Before you start

You need refern installed and a workspace set up. If you have not done that yet, download at refern.app and follow the first-run prompts to point refern at the folder where your references live. refern indexes the files in place and never copies or moves them.

Wait for the initial indexing to complete (you will see the progress bar in the pipeline card disappear). For libraries under 50,000 files this usually takes a few minutes. For larger libraries it takes longer, but you can close the app and it will resume where it left off on next open.

Once indexing is done, your images all have their pHash values computed and stored in the local database. The duplicate search is a fast query against those stored values, not a real-time scan.

Step 1: Run the duplicate search

Click the search bar at the top of the library view (or press Ctrl+K / Cmd+K to open the search overlay).

Type:

is:duplicate

Hit Enter. refern queries all pHash values in the library and returns every image that has at least one near-identical counterpart.

The result set shows all members of each duplicate group. If you have three copies of the same image, all three appear. If you have 500 unique images and 20 are duplicates forming 10 pairs, you will see 20 results.

You can combine the operator with other filters. For example:

is:duplicate tag:anatomy shows only duplicate images that are also tagged "anatomy."
is:duplicate in:moodboard-folder narrows the scan to one folder.
is:duplicate rating:>=3 shows only highly-rated duplicates, which is useful when you want to confirm you are keeping the better copy before deleting the other.

Step 2: Review each duplicate pair

The search returns a flat grid. You want to look at each pair side by side before deleting.

Click any image to open the metadata sidebar on the right. Check the source URL, rating, tags, and notes. This tells you which copy has more metadata attached to it, which copy you imported more recently, and which you rated higher.

For a closer look, right-click the image and choose "Open original." This opens the file directly in your OS default viewer, so you can compare the full resolution of both copies.

Things to check:

Resolution. If one copy is 4000 x 3000 and the other is 800 x 600, keep the larger one.
Format. A PNG original is usually better than a JPEG recompression. Check the file extension in the filename or metadata panel.
Metadata richness. If one copy has tags, a source URL, and a rating while the other is bare, keep the richer one.
Tags and links. If one copy is tagged and linked to canvases or groups, deleting it breaks those connections. Prefer to delete the untagged copy.

Step 3: Delete the copies you do not need

Select the image you want to remove. Press Delete or right-click and choose "Move to trash." refern soft-deletes the entity: the thumbnail and index entry are removed, but the original file on disk is moved to refern's trash area, not permanently deleted yet. You have a window to undo.

To permanently remove files from disk, open the trash (Settings, Trash), select the items, and choose "Delete permanently."

When you want the duplicate out of refern but still on disk, use "Remove entry" instead of deleting. It drops the item from refern's library and leaves the file exactly where it is. This is the right choice when the "duplicate" is a copy you keep deliberately (a client deliverable, an export, a backup) and only want out of your library view. One caveat: because the file is still sitting in an indexed folder, the next sync picks it up again as a fresh item. If you want it gone from the library for good, move the file out of your workspace folders first, or hide the folder it lives in.

If you have many duplicates, you can multi-select in the duplicate search results. Hold Shift and click to range-select, or Ctrl/Cmd and click to pick individual images. Then delete all selected at once.

A practical order of operations:

Sort by date added (oldest first) so the original tends to appear before the repeat.
Scan through, rating the best copy with at least one star if you have not already.
Filter to is:duplicate rating:0 (no rating) and delete all of those. This leaves only the rated copies.
Re-run is:duplicate to confirm the count dropped.

Step 4: Use "Find similar" for near-duplicates pHash misses

pHash catches images that are visually nearly identical but allows a small tolerance. A heavily cropped version of the same image, or one with a strong color-grade applied, may not match.

For those cases, right-click any image in the library or on a canvas and choose "Find similar." This opens a radial menu with a visual similarity search. It uses a 512-byte local descriptor (HSV histogram, dominant colors, color layout, edge histogram) rather than pHash, so it catches softer matches.

This is useful for:

Finding that you saved both a tight crop and the full original.
Spotting images from the same photoshoot (same lighting, same model, slightly different pose).
Locating multiple versions of an illustration that went through different color palettes.

Visual similarity results appear in the search overlay, ranked by similarity score. Review them the same way as the pHash results.

Step 5: Set up a smart folder to catch future duplicates

Once your library is clean, you want to stop duplicates from piling up again.

Go to the smart folders panel (left sidebar, the folder icon with a filter mark). Create a new smart folder. Set its query to is:duplicate. Name it "Duplicates."

Now whenever duplicates accumulate, that folder shows a non-zero count as a visual reminder. You can check it after a big import session instead of having to remember to run the search manually.

Common problems and fixes

"I ran is:duplicate and got zero results even though I know I have duplicates."

Check that indexing finished. If the pipeline progress card is still visible, the pHash values have not all been computed yet. Wait for it to complete, then retry.

"I deleted the wrong copy and now the better file is in trash."

Go to Settings, Trash. Find the file, right-click, and choose Restore. It returns to its original folder with all metadata intact.

"The duplicate search returned two images that look completely different to me."

pHash is a visual hash, not a byte hash. Two very similar-looking abstract textures or solid-color gradients can produce close hashes. Review the pair manually. If they are genuinely different, keep both. The false-positive rate is low but not zero.

"I have thousands of results and no time to review them all."

Use the combination filters described in Step 1. Start with is:duplicate rating:0 tag: (no rating, specific tag category) to pick off the lowest-value duplicates first. Work through categories rather than trying to process everything at once.

How other tools compare on duplicate detection

PureRef, BeeRef, and Allusion do not have any form of duplicate detection.

PureRef has no search, no metadata index, and no tag system at all. pureref.com/handbook/features confirms this. Finding a duplicate in PureRef means scrolling the canvas by eye. At any board size above a few dozen images this is impractical.

BeeRef (free, open source, GPL-3.0) also has no search or metadata layer. The feature list at beeref.org shows a canvas-focused tool with no library system. There is no path to duplicate detection without a database, and BeeRef has none.

Allusion (free, GPL-3.0) is the closest competitor in spirit because it does index files by tag and folder, but it has no pHash, no visual similarity, and no duplicate operator. A GitHub issue documents a RAM crash at 358 images during thumbnail generation (issue #640), and the project has been effectively unmaintained since February 2023, so this gap is unlikely to close. The project does have basic tag-based search, and hierarchical tags out of the box are a genuine strength for smaller libraries that do not need deduplication.

refern is the only one of these four tools that ships local duplicate detection. The pHash is computed at index time, so the is:duplicate query is instant regardless of library size.

Next steps

Once your library is clean:

Set up a smart folder for ongoing duplicate monitoring as described above.
If you are coming from PureRef and want to understand what else refern adds, see the refern vs PureRef comparison.
If you are evaluating whether to switch from Eagle or are managing a very large library, the best Eagle alternatives roundup covers the full landscape.
Use is:duplicate combined with rating:>=3 occasionally to audit whether any of your highest-rated references have duplicates that could be cleaned up.

A clean library is faster to search, easier to browse, and less likely to surface the wrong version of an image when you are mid-project and in a hurry. Running the duplicate scan once a month, especially after a batch import session, keeps things manageable.

Find Duplicate Images in Your Library (2026 Guide)

Why reference libraries fill up with duplicates

Before you start

Step 1: Run the duplicate search

Step 2: Review each duplicate pair

Step 3: Delete the copies you do not need

Step 4: Use "Find similar" for near-duplicates pHash misses

Step 5: Set up a smart folder to catch future duplicates

Common problems and fixes

How other tools compare on duplicate detection

Next steps

Frequently asked questions

What is perceptual hashing (pHash) for images?

Does refern send my images to a server to find duplicates?

Can refern find duplicates that are slightly different, such as a cropped or resaved copy?

PureRef, BeeRef, and Allusion all have a duplicate detection feature, right?

How large a library can refern's duplicate scan handle?

One library for your references, with a canvas built in.

Sources

Best PureRef Alternatives for Linux in 2026

Best Reference Managers for Artists in 2026

Best Reference Managers for Artists 2026 (Top 10)

Tag Reference Images So You Can Actually Find Them (2026)

Why reference libraries fill up with duplicates

Before you start

Step 1: Run the duplicate search

Step 2: Review each duplicate pair

Step 3: Delete the copies you do not need

Step 4: Use "Find similar" for near-duplicates pHash misses

Step 5: Set up a smart folder to catch future duplicates

Common problems and fixes

How other tools compare on duplicate detection

Next steps

Frequently asked questions

What is perceptual hashing (pHash) for images?

Does refern send my images to a server to find duplicates?

Can refern find duplicates that are slightly different, such as a cropped or resaved copy?

PureRef, BeeRef, and Allusion all have a duplicate detection feature, right?

How large a library can refern's duplicate scan handle?

One library for your references, with a canvas built in.

Sources

Keep reading

Best PureRef Alternatives for Linux in 2026

Best Reference Managers for Artists in 2026

Best Reference Managers for Artists 2026 (Top 10)

Tag Reference Images So You Can Actually Find Them (2026)