Web Analytics Without Cookies: How It Works

If you have looked at privacy-friendly analytics tools recently, you have probably seen claims like "no cookies, no fingerprinting, fully GDPR compliant." What that actually means in implementation is less often explained.

This post covers the technical reality: the three approaches in use today for cookieless tracking, how each one works, and what the accuracy trade-offs look like.

Brief History: Why Cookies Dominated Analytics

Cookies were invented in 1994 by Lou Montulli at Netscape. The original problem was practical: HTTP is a stateless protocol. Each request is independent of every other. Without some mechanism to persist state on the client side, a shopping cart could not remember what you added on the previous page.

The solution was simple. The server sends a Set-Cookie header with a key-value pair. The browser stores it locally. On every subsequent request to that domain, the browser sends the cookie back in a Cookie header. The server reads the value and knows it is the same browser.

Analytics tools adopted this pattern directly. The standard implementation was to generate a UUID on the first visit, store it in a cookie with an expiration date set 1-2 years in the future, and read that UUID on every pageview. This gave you a persistent client_id that survived browser restarts and accumulated visit history over months.

The approach was technically elegant and worked well for about 20 years. Then browsers started systematically breaking it.

Safari's Intelligent Tracking Prevention (ITP), introduced in 2017 and significantly tightened in subsequent releases, began capping cookie lifetimes and blocking third-party cookies. Firefox added similar protections. Chrome took a different path: after years of announcing a deprecation timeline, Google reversed course in July 2024 and announced it would not eliminate third-party cookies but would instead introduce its Privacy Sandbox initiative to give users more browser-level control over cross-site tracking. Simultaneously, GDPR came into force in 2018, making cookie-based analytics legally complicated for any site with EU users.

The analytics industry responded in three ways.

The Three Cookieless Approaches

1. Browser Fingerprinting

Fingerprinting tries to reconstruct a unique identifier from signals the browser exposes without any explicit storage. The inputs vary by implementation, but commonly include:

Canvas rendering (how the browser draws a specific canvas operation, which differs by GPU, OS, and font rendering engine)
WebGL renderer information
Installed fonts (accessible via font metrics detection)
Screen resolution and color depth
Timezone offset
Navigator properties: navigator.hardwareConcurrency, navigator.deviceMemory, navigator.languages
Audio context characteristics

A well-implemented fingerprint can be statistically unique for a large fraction of browsers. The EFF's Cover Your Tracks project found that approximately 83-87% of desktop browsers have a unique fingerprint detectable by their test, with the variance depending on browser version and installed fonts.

Why fingerprinting is a bad choice:

First, it is legally problematic. The European Data Protection Board (and its predecessor, the Article 29 Working Party) has stated that browser fingerprinting constitutes processing of personal data and requires consent under GDPR. The "privacy-respecting" framing that some fingerprinting-based tools use is not accurate.

Second, it is increasingly inaccurate. Safari, Firefox, and Brave all implement fingerprint randomization or resistance. Canvas fingerprinting returns randomized results in Brave. Firefox's Fingerprinting Protection (enabled in its stricter privacy modes) does the same. Corporate proxies and VPNs cause many users to share a visible IP, and resistance techniques mean the canvas hash no longer distinguishes between them.

Third, it fails on shared machines. A library computer, a shared work laptop, or a family iPad may be used by multiple people who produce similar or identical fingerprints.

2. LocalStorage and SessionStorage

Some tools moved from cookies to browser storage APIs as a workaround. localStorage persists data indefinitely (no expiration). sessionStorage clears when the browser tab closes. Both can store arbitrary strings, making them functionally equivalent to cookies for visitor identification purposes.

This does not solve the legal problem. GDPR's ePrivacy Directive applies to storing information on a user's device, regardless of whether you use cookies, localStorage, sessionStorage, or any other mechanism. If you store a visitor identifier in localStorage, you are still storing an identifier that constitutes personal data, and you still need consent.

Some browsers are also beginning to limit localStorage lifetimes. Safari's ITP already applies storage quotas to sites that are classified as trackers. The direction of travel is clear.

3. Server-Side Session Hashing with Ephemeral Identifiers

This is the approach used by Abner, and it is the only one that is both technically sound and genuinely privacy-respecting.

The core idea: instead of storing an identifier on the client side, derive a temporary identifier on the server side from information that is already present in every HTTP request, and discard both the identifier and the inputs after the session ends.

How Server-Side Session Hashing Works in Detail

Every HTTP request that hits your server carries several headers that the browser sends automatically, without any JavaScript:

X-Forwarded-For or the socket IP: the client's IP address
User-Agent: a string identifying the browser, version, rendering engine, and OS (for example: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36)
Accept-Language: the browser's preferred languages, such as en-US,en;q=0.9 (available as context but not included in the hash; see formula below)

From the JavaScript snippet, before any data is sent to the server, you can also collect without consent:

window.screen.width and window.screen.height
window.innerWidth (viewport width)
document.referrer
window.location.href

You concatenate the IP address and User-Agent with a salt value that is generated fresh every 24 hours, then apply a cryptographic hash function (SHA-256 is standard). The resulting hex digest is your anonymous session token for the day. Accept-Language is not included in the hash itself. IP address and User-Agent provide sufficient entropy for session differentiation, and adding Accept-Language would increase the risk of re-identification without meaningfully improving accuracy.

session_token = SHA256(ip_address + user_agent + daily_salt)

The critical properties of this construction:

One-way. SHA-256 is a one-way function. Given the output hash, you cannot recover the IP address or User-Agent. Even if your analytics database were fully compromised, the attacker could not reverse-engineer visitor identities from the session tokens.

Daily rotation. The salt value changes every calendar day. This means the same browser visiting on Monday and Tuesday produces completely different tokens. There is no linkage between the two visits, even by the analytics vendor. This is not a policy claim; it is a mathematical property of the system.

No client-side storage. The token is computed and used entirely server-side. Nothing is written to cookies, localStorage, sessionStorage, or any other browser storage API.

What Abner Specifically Does

Abner's tracking script (abner.js, approximately 1.8KB) sends a minimal event payload to Abner's ingestion endpoint when a page loads: the page URL, referrer, user agent, viewport dimensions, and UTM parameters. The script sets no cookies and uses no browser storage.

On the server side, the ingestion layer hashes the visitor's IP address together with the User-Agent string and a daily-rotating salt. The raw IP address is never written to disk or to the ClickHouse analytics database. The resulting hash is stored and used to approximate session boundaries within a single day: a sequence of events sharing the same hash and arriving within a session timeout window are grouped as one session.

When the next day's salt is generated, the connection between all previous hashes and any real-world visitor identity is broken, permanently. There is no recovery path.

Comparing the Three Approaches

Accuracy Considerations

False positives: multiple people sharing an IP

The biggest accuracy limitation of server-side hashing is that two people on the same network appear as one session if they happen to visit at the same time with the same User-Agent string. This is common on corporate Wi-Fi networks, university campuses, and mobile carrier networks that use carrier-grade NAT.

In practice, the User-Agent string usually differs enough between people on the same network (different browsers, different OS versions, different screen sizes) that they produce different hashes. But not always. This means session counts tend to be slightly underestimated in enterprise-heavy traffic.

False negatives: bots

Analytics systems that count every server request are exposed to bots. Search engine crawlers (Googlebot, Bingbot, and others) send requests with identifiable User-Agent strings. Uptime monitors, link checkers, and security scanners do the same. A naive implementation would count all of these as pageviews.

Abner performs bot filtering using User-Agent string analysis against a maintained list of known bot signatures. Requests matching known bots are discarded at the ingestion layer before any data is stored.

More sophisticated bots that rotate IPs and spoof browser User-Agents are harder to detect. Most analytics tools address this by also requiring that the JavaScript snippet executes successfully (headless browsers without JavaScript support cannot trigger the event), which filters out a large class of simple crawlers.

Comparison to sampling-based analytics

Google Analytics 4's free tier applies data sampling for reports that exceed certain event counts. The sample rate varies, but for high-traffic properties, a report might represent only 10-20% of actual traffic. Cookieless analytics tools running on 100% of events often produce more accurate aggregate numbers than sampled GA4 reports, despite the session-linking limitation.

When You Still Need Cookies (or a First-Party Session)

Server-side hashing is appropriate for top-of-funnel analytics: pageviews, referrers, conversion events, and feature usage patterns at the aggregate level. It is not appropriate when you need to track a specific logged-in user's journey across multiple days.

If you want to know that User A signed up on Monday, upgraded on Wednesday, and churned on Friday, you need a persistent identifier tied to that user's account. This is exactly what your application's authentication session already provides.

The correct architecture for this use case: your application assigns each user a persistent identifier at signup (typically a UUID that is your database primary key). When you log analytics events for logged-in actions, you pass that identifier (or a hash of it) as a property on the event. This is a first-party identifier derived from a contract the user agreed to (your terms of service), not a third-party tracker.

Abner supports passing custom event properties, so you can instrument this directly. Pass user_id: sha256(user.id + server_secret) as a property and you get per-user journey analytics without storing raw user IDs in your analytics database.

Cookieless analytics is not a compromise. It is a more accurate, legally safer approach to tracking aggregate user behavior. The implementation requires thinking carefully about session boundaries and bot filtering, but none of that complexity falls on you as the site owner. It is handled inside the analytics tool.

The session hashing approach is not magic. It involves a real tradeoff: you get clean, complete analytics data without consent overhead, but you give up persistent cross-visit identity. For most SaaS analytics use cases, that is the right tradeoff. When you need cross-day user tracking, your application's authentication session is the right tool for that job.