Introduction to website categorization API

Many companies need website categorization API to help them in their services.

An app which provides web content filtering e.g. needs to know if the given website that the user wants to visit is shopping, gaming, social media or other kind of website which the employer does not want employees to visit during their work time.

This is where website categorization API comes in and helps as it has already pre-categorized the websites (or can do that on the fly when then user enters the URL in browser or elsewhere.

When it comes to website categorization there are many definitions or so called taxonomies out there.

The most well known one is IAB from Internet Advertising Bureau, it is especially popular in the advertising sector because it is tailored to their requirements.

It has several different Tiers with some of the lower ones supporting over 400 different categories.

Here is an example of a few categories from IAB taxonomy:

Automotive
Books and Literature
Business and Finance
Careers
Education
Events
Holidays
Attractions
Shopping
Personal Celebrations & Life Events
Family and Relationships
Fine Art
Food & Drink
Healthy Living
Hobbies & Interests
Home & Garden
Medical Health

This is just for the top tier.

If you need website categorization for more refined categories you need to use Tier to Tier 5.

Useful resources for content  categorization:

https://www.tensorflow.org/

https://www.npmtrends.com/productcategorization

https://openbase.com/js/productcategorization/documentation

New trending products and goods to sell in 2022

In this article we will explore the new trending products to sell in 2022 using Saas Platform that was developed for this purpose.

New trending products and goods to sell in 2022 was determined by determining the number of searches on various search engines through the last 12 months.

We used URL classification tool from https://www.productcategorization.com/ for classification of products into three tiers of categories.

For Tier 1, the categories are:

Animals & Pet Supplies
Apparel & Accessories
Arts & Entertainment
Baby & Toddler
Business & Industrial
Cameras & Optics
Electronics
Food, Beverages & Tobacco
Furniture
Hardware
Health & Beauty
Home & Garden
Luggage & Bags
Mature
Media
Office Supplies
Religious & Ceremonial
Software
Sporting Goods
Toys & Games
Vehicles & Parts

Tier 2 has 192 categories and Tier 3 has 1349 categories.

Let us first check the category of Apparel Accessories.

A huge jump in interest for Mirabel dress:

Example description for Mirabel dress:

The Mirabel Dress is the latest addition to our line of women’s dresses available at our online store, and it’s sure to make you look as great as you feel.

It’s made of 100% cotton, so it breathes well and feels soft against your skin. It has a woven material, so you can be sure of its durability. And it’s got a classic style that never goes out of fashion.

Next one is Infinite Hoop:

Example description of Infinite hoop:

The Infinity hoop is a weighted hula hoop that’s fun and effective. The unique, weighted design of the hoop makes it easy to use. Even if you’ve never tried a hula hoop before, a few spins with the Infinity will prove that this toy is anything but child’s play.

Next product that has experience a surge in interest are Sky Glasses, check it out below:

Example description for Sky Glasses:

Look to the skies with these stylish, yet essential Sky glasses. Featuring UV400 protection, scratch-resistant lenses and five interchangeable color frames – including tortoise shell and a variety of colored lacquered hardwoods – they’re sure to provide not only great sun protection but plenty of style as well. Plus, they come with a fashionable matching vinyl case.

Why the need for trending products and keywords research?

Online shopping has made all kinds of products more available to consumers—and the options available to them are growing.

At the same time, the global e-commerce market is becoming more and more demanding. We expect more out of everything that we buy, and we want it at a cheaper cost.

So how can you, as an online seller of products and services, stay on top?

The answer is simple: by staying ahead of trends.

Using data to identify which trends are heading your way will help you prepare for what’s coming next, so that you can serve your customers better and make sure you’re not left behind when people’s tastes change.

Regarding trends, one can look at trends research as keywords research. So looking for trends in credit repairs is similar to finding best credit repair keywords.

Google Trends

With so much competition, it’s more important than ever to have a clear idea of what your customers want.

Luckily, there are a few things you can do to get a better handle on what’s selling and what isn’t selling so well. Using Google Trends to find out which products are trending can help you keep up with the market and avoid getting left in the dust by your competition!

Here are some tips and tricks to use Google Trends successfully:

Look at trends from the past. If you want to know what people will buy next year, look at trends from last year! For example, if we want to see if there is an upward or downward trend for coffee makers in November 2021 (and thus predict coffee maker sales for 2022), we would go back one full year and check out how many searches were made about coffee makers during December 2021 (and then compare these numbers against those from January 2022). You can also use this technique when looking at products that don’t have seasonal changes like clothing items or electronics.

Google Trends data can be very useful if you want to do a project in machine learning consulting, because it has both range (billions of keywords available) as well as time element (with pytrends you can get data for 5 years).

 

Bitcoin Fear Greed Index

Bitcoin is the largest cryptocurrency in the world by market cap.

It is also the one that started it all with the paper in 2009. For those interested in the paper, you can read it here: https://bitcoin.org/bitcoin.pdf

Quite amazing how far the digital money revolution went and we still do not know who is Satoshi Nakamoto. After all these years.

Bitcoin and cryptocurrency market in general has experienced multiple bear and bull markets since its beginning. With some huge returns and also steep losses.

This huge swings in emotions can be captured when one computes the sentiment of tweets published about Bitcoin and then builds a tracker or indicator of Bitcoin Fear and Greed.

The way one does this is as follows:

– collect hourly all tweets published that contain either the word Bitcoin and / or the cash hashtag $BTC

– compute sentiment for each tweet using machine learning model. One does not necessarily need a complex neural net model like LSTM, but a logistic regression or support vector machines is enough for classifying the sentiment polarity

– from this one computes the average sentiment in each hour

– this can be converted to a scale 0-100. This is necessary because the sentiment values are usually between -1 (negative) and 1 (positive).

This then results in a Bitcoin Fear and Greed Index indicator.

Here are example charts from provider of Fear and Greed indices at https://cryptofeargreedindex.com:

 

 

Ethereum fear and greed index

Fear and Greed Index is a way to capture the current emotions, greed and fear about the given market. There are fear and greed indices for stocks markets, but also for the crypto market. The latter captures the sentiment expressed towards cryptocurrencies like Bitcoin, Ethereum, Cardano and others.

Most often, one measures the fear and greed index by collecting the social media posts about these coins and then computing sentiment polarity of these posts, using some form of machine learning model that has been trained on sufficient number of social media posts on Twitter, Facebook, Instagram and others.

By focusing on individual coins, like Bitcoin or Altcoins, like Ethereum and others, one can compute individual fear and greed indices, e.g. Ethereum Fear and Greed index or Bitcoin Fear and greed index.

If one wants to know about the sentiment and emotions about the overall market, it is best to use some form of market-cap weighted approach to this.

Ethereum Fear and Greed Index is thus computed, e.g. with latency of 1 hour, by averaging the sentiment of social media posts about Ethereum or its ticker ETH and then computing the percentiles, which map the index value to the five possible states:

– extreme greed

– greed

– neutral

– fear

– extreme fear

Sentiment classification machine learning models used for this belong to the class of Natural Language Processing Models, they are essentially text classification models. Other examples of text classification models include product categorization, opinion mining and classification and others. An interesting text classification models is also multi label classification, where one does not predict between mutual exclusive labels, but one can rather predict one or more labels given text.

One application of multi label classification is for example product tagging.

Here is an excellent library that allows multilabel classification: http://manikvarma.org/code/Parabel/download.html

Explainable Artificial Intelligence or XAI

In recent years, we are increasingly being influenced by machine learning models, most of the time without even realizing it.

When you get recommendations for movies from Netflix, it is all done by a special class of machine learning models called recommender systems. Netflix actually has a nice page showcasing their research and usage of machine learning models:

https://research.netflix.com/

When you visit the bank next time and applying for a loan, the decision about outcome may be actually made not by a human but by a ML model.

Another important area of our lives impacted by AI is in medical diagnosis. The latter is increasingly being driven by AI models, which learn on vast data sets with millions of historical diagnosis. The doctor of the future may still be a human but only explaining a suggestion or diagnosis made by an AI model.

As the number of decisions made by AI models or machines rise, we are increasingly confronted with an important question: How do AI models make decisions? Why was this person particular rejected when applying for the bank loan?

A large issue in these questions is that most of the machine learning models are more or less black boxes. Although some simpler ones, like logistic regression, decision trees and linear regression are more easily understood, the more complex ones like gradient boosting machines and deep learning nets are mostly too complex to have an easy interpretation for them.

This has led to the development of special methods that help us interpret even the most complex ML models. As these methods do not depend on the specifics of a particular ML method, they are known as model agnostic methods. The are an important class of method for data science consultants.

Whatever the problem and the ML algorithms we can apply them to know trained ML model better and also compare them.

Some of the most well known Explainable Artificial Intelligence or XAI methods are LIME, SHAP and Permutation Feature Importance.

Permutation Feature Importance works the following way: to find out how important is the feature F for the ML model prediction, one permutates the values of these feature and recomputes the model’s error. If we see a jump in the error, we deem the feature important.

LIME approach works a bit differently. In this case, one trains an approximate linear model around the data instance, the prediction of which we are trying to explain. Linear models are namely more easily interpretable, one just needs to look at coefficients of linear model. One can then use this simplified linear model to explain which feature values most contributed to the outcome.

SHAP is based on Shapley values introduced in the 1950s by Shapley. Shapley is the recipient of the Nobel Prize.

Shapley values of feature values tell us how much did a given feature value contribute the “net” prediction (prediction in average baseline prediction). SHAP method is the one that is theoretically most rigorously defined. SHAP library even has a method specialized for deep learning nets: Deep Explainer. It has connection with DeepLift.

These methods can help us to understand two things about ML predictions:

  • which features are most important in ML model
  • for given instance, which feature values were most important contributors to outcome probability and in what direction (positive, negative)