“Ultimately, Captchas are useless for spam because they’re designed to tell you if someone is ‘human’ or not, but not whether something is spam or not.”
You do not have to work in IT to know what spam is. Besides piles of unwanted e-mail, there are spam bots, or special software programs designed to act as human Web site visitors that are posting unwelcome messages over the Internet to advertise dubious services, while more often spammer messages do not even make much sense. Similarly to bacteria and viruses mutations developing antibiotic resistance in real world, spam bots are becoming more resilient penetrating Internet firewalls and security layers. We have interviewed creators of CleanTalk, an amazing service that protects thousands of blogs and forums with their very efficient spam-filtering engine.
NubisNovem: Let us start with introductions. Please tell our readers how the service was created, who stands behind its internal architecture, where the idea originated, who was the first to invent and implement CleanTalk cloud, and to work on multiple plugins for various blog platforms?
CleanTalk: The idea to create the service was conceived by Denis Shagimuratov, the project founder. About five years ago, as he was helping to maintain a Web forum running phpBB, Denis came across the problem of spam posts. Those remedies that existed back then were not completely adequate to the task and would not filter all of spam messages. That was when the anti-spam solution of his own started to evolve. Denis came up with the first plugins, the system infrastructure, the server and Web site configuration, the control panel etc. done all on his own. The very first module was released for phpBB in May, 2011. Over the time, three other team members joined CleanTalk, with one leaving the position at some point. Initially it was just Denis who worked on the project full time while Alex Bezborodov and Alex Znaev were moonlighting.
NN: Was the service intended to filter blog spammers specifically or was it intended for something else; was there any mathematical relation, or algorithm, or perhaps a model that were used with this in mind?
CT: At the beginning, it was intended for phpBB forums only. Then our customers started requesting protection options for other platforms. Gradually, new modules and plugins emerged as we were targeting most popular content management systems (CMS). By initial design, our service was intended to catch any type of spam (both originated by scripted spam bots and posted manually). Spam comments written by individuals were detected by an algorithm weighing language relevancy to the topic. Soon we realized that there was practically no demand for manual spam filter, so we switched to protection against spam bots exclusively. For that, we analyze patterns of real visitors’ behavior and compare those against the behavior of spam bots. That allows us to effectively filter and ban automated postings.
NN: As of today, you report more than 104 thousands of active subscriptions. What was the starting number of subscribers or participating blogs? How did you overcome the teething problems in order to provide for successful growth? When the number of customers became critical for you to realize that the system really worked?
CT: We would like to clarify that our current 104 thousands is the number of subscribed Web sites. Some of our bigger customers, like Web design studios and similar businesses, maintain 30 to 50 active CMS Web sites each. It is hard to tell when the number of subscribers became crucial for the service launch decision; we may assume that the first half of 2014 became the pivotal point. Back then, we have already broken even with hosting expenses, but it was not before August 2014 when it became evident that CleanTalk as a business got real traction and started growing at a quick pace. The real challenge was to attract stable influx of new customers. We have experimented quite a lot with elements affecting conversion; we have never actually stopped polishing that. We continue to analyze the results, consider new approaches and introduce changes.
NN: How many people are on the team and how the responsibilities are distributed among them? What is the load for technical support? Do many technically related questions come from the users, or are most of them just content and quietly enjoying the service?
CT: Right now we have five people on board. Tasks and responsibilities are distributed as follows. One person is taking care of server areas and databases. Then there is a programmer responsible for client applications (modules and plugins). Another one is working on the Web site (back-end and front-end). And one more is charged with business development and promotion. Finally, Denis oversees and manages the whole project getting involved in the implementation of all parts. Some tasks we outsource as side projects to freelancers and consultants. Users are coming with issues ranging from questions like “How this service works without the use of capcha?” and “How spam bots are bypassing capcha” to requests to integrate scripts with their Web sites. Sometimes, our decisive engine makes a mistake and a Web form passes spam. We initiate a full investigation of any such case examining its causes. The examination is based on our customers’ requests and feedback. Depending on the causes discovered the adjustments are introduced to the parameters of the engine that detects spam bots or updates of the source code of some plugins are conducted.
NN: Our own experience confirms that CleanTalk is highly effective. For an year and a half, we have been running it to protect our two WordPress blog sites with the only one (!) false negative while thousands spammer comments were detected and filtered successfully. How did you achieve this? What is behind CleanTalk cloud, some well-known technologies or your own inventions and know-how? Could you share some details about the system? Do you use specific hosting providers or platforms for parallel computations such as Amazon AWS, Docker, or the like? What programming languages do you use?
CT: We do not incorporate any of out-of-the-box software solutions or filtering technologies, but we use some ideas derived from technologies of in-house origin. CleanTalk’s spam detection logic is based on 20 various metrics. We are relying on dedicated servers hosted in several countries, at various geographical locations. All our data are processed there, our technology requiring no extensive computation power. The system lends itself to scale very easily, we just add more servers when needed. Also, we strive for optimal load of server software, database engines, and other resources.
NN: How vulnerable is CleanTalk to external attacks and accidents like hardware failures or data losses due to logical errors? How well the internal spam information and subscriber details are guarded against unauthorized access etc.?
CT: This is a fine point, for it is difficult to assess security. Any service or site can turn out vulnerable, for there is no such thing as a silver bullet. As we have mentioned, our multiple servers and providers are deployed in several countries. If one of them goes down, requests would be redirected to the nearest available. Situations like this should not affect our customers. Servers content is identical. Our backup copies cannot be accessed from the outside.
NN: Have you been contacted by spam organizations, did they ask you to white-list them making your system pass their spam comments from time to time, so that it would look as a random slip?
CT: No, we have never received a request like that. It is a matter of principle for us as we value our reputation above a payout. We would not allow that to happen.
NN: Have you ever been contacted by government organizations, for instance, looking to get help with spam filtering or looking to get access to lists of spammers, like IP addresses and such?
CT: No, we were not approached by those bodies either. We are open to collaboration for protection against and fighting spam in any form. Our spammer black lists might be useful for hosting and Internet service providers. Using that information a provider can initiate safety measures, alerting customers that their computers are infected with viruses sending spam out.
NN: By the way, we were curious, how come a profile link from CleanTalk weekly e-mail report is allowed without a password. Could it be a security issue, or what? Or, maybe the link is personalized and would only work for individuals it was intended for? Are you using the same technology that detects individual browser footprint that you use for spam detection?
CT: The profile link is indeed personalized and it would only work for the individual for whom it was generated. Actions that are permitted to a visitor coming via that link are limited. In case a perpetrator should get access to your profile via that URL, they would be able to pay your invoice or browse through the service requests. That was mainly intended to enable a quick look at processing details allowing a more convenient quick access. Besides, many Web sites these days are maintained by more than one person. It makes their life easier with no password required for some procedures.
NN: Apart from an obvious goal to support more blog platforms, what is your business development plan? What your customers should expect in future? Are subscription price or structure going to change?
CT: Subscription or price terms will not change. We consider $8 per year a very well positioned point of mutual value, we have no immediate plans to change that. What we may adjust in future is the trial period. Our short-term plan is to promote and further develop the spam firewall meeting the existing demand for that feature and allowing to decrease Web server load by blocking spam bots before they hit Web pages. Another plan for the future is adding more behavioral metrics for spam bot detection. That should bring the accuracy of detection up to 99.998%.
NN: Would it be possible to use similar approach to filter spam messages in e-mail? Recently, e-mail providers such as Google and Yahoo! have resorted to some aggressive measures against spam bringing about some harsh criticism due to the increased numbers of false positives in the course of detection. Could CleanTalk service for spam filtering help everyone with a better, more accurate and more effective anti-spam solution for e-mail?
CT: We did not conduct a research to filter spam on e-mail servers. Theoretically, we could share our database with some e-mail spam filtering systems, like a list of spammer IP and e-mail addresses. Though it would be difficult to adapt CleanTalk’s algorithm to e-mail filtering, as many e-mail correspondents are using specific software to work with e-mail. It would be much more difficult to distinguish a legitimate use of such software from spam bots.
NN: Do you have any message for your users or perhaps to bloggers who are not familiar with CleanTalk and are not your subscribers yet. Are you happy with your user feedback and the popularity of your service? Do you see any potential for future growth or are pretty much hitting a ceiling with it?
CT: We have a great esteem for the existing feedback and support from our users. It truly helps us to improve our service. We are grateful for the feedback expressing criticism towards our work as much as for that expressing gratitude. There is a lot to be done to enhance the service promotion and growth. The fact is that the global number of Web sites has exceeded one billion, and that opens up a wider area of opportunities for us. For those who have not seen CleanTalk in action yet, our suggestion is to install it, give it a try, compare our solution with competing services. Then send us some comments, suggestions, and criticism.
NN: What is your strategy with subscribers that do not renew? Do you send reminders? While searching online, I have seen a few blog Web sites that contained the note “Antispam disabled. Check access key in CleanTalk plugin options” or something along those lines. How to deal with those?
CT: To remind of renewal, few notices would be sent prior to the expiry date, an alert is displayed on the Web site dashboard, and an e-mail is sent 30 days before the renewal due date; also, there is a final notice alerting about the service access to be disabled. That is clearly enough for the conversion rate of up to 55–65%. Approximately half of non-renewing users are either Web sites that have closed or owners that have stopped maintaining their Web sites. Some users would try free-of-charge solutions, then they come back to renew their subscription with us. Some Web site maintainers would just disable any commenting and contact form functionality on their Web site for good as unnecessary. We consider this conversion rate as a sign of a loyal customer base and a positive trend proving that our hard work during the year was not in vain.
NN: What competitors do you deem worth mentioning for comparison? Akismet? Is there anybody else? How they are different and why is your service unique?