Machine Learning techniques were applied to optimise the placing and configuration of cells in a mobile network inside a large shopping centre. The techniques successfully predicted service degradation, discovered cell outage and discovered cell anomalies in real time. This work was performed by Automation Consultants’ Machine Learning partner Thingbook.io.
Background
The arrival and rapid adoption of smartphones has fuelled massive growth in mobile traffic. The only way for mobile operators to meet the demand is by extensive use of small cells. Small cells were originally intended to be sticking plasters, little boosts of low-power spectrum to fill gaps in the radio coverage of macro cells and enable densification – more densely packing the cells in a network to increase capacity. The macro cells were expected to continue their role as the main pillars of the mobile network, and require all the support and exactitude of design worthy of their role and scale. Network standards were designed for networks predominantly composed of macro cells, however small cells have proliferated so much that most mobile operators have what is seen as a heterogeneous network (HetNet). A factor in this trend is that today, approximately 80% of mobile data traffic is generated indoors, and coverage in indoor public spaces is typically provided by small cells.
A key objective in managing a mobile network is ensuring adequate coverage and sufficient capacity to handle demand. Any failures of cells should be detected as quickly as possible. In the short term, neighbouring cells should be configured as quickly as possible to compensate for the outage (e.g. by increasing their power output and/or realigning their signals and antennae), and the outage must be fixed as soon as engineers can be sent to the site. Today’s networks are designed to be self-organising networks (SONs) and capable of self-healing, but the fact that they are heterogenous makes self healing much harder, and in practice many operators are reluctant to use SON technology, which was not designed for HetNets, fully.
Problem
The problem consisted of optimising a network of small cells set up in a shopping centre. The network had to ensure complete coverage, and an acceptable user experience in all parts of the centre. Mobile networks are highly complex, and cells can often fail such that the only sure way of knowing about it is by trying to use a phone in the affected area. If a cell fails, therefore, it is not always easy to detect before users begin to complain. The conventional monitoring available for the shopping centre network might show an anomaly, but it was often very difficult to deduce from it the true cause of the problem.
Solution
The solution deployed consisted of analysing in real time very large quantities of data from conventional monitoring and other sources and applying machine learning to it. The machine learning could discern patterns in the data, detect likely network failures and infer the most likely root causes. Before operational use, a leaning period of four months took place in which data were analysed in a supervised way, and the machine learning software learned to associate anomalies with root causes.
The sources of data analysed were
- conventional logs from network monitoring;
- Billing data (call detail records, CDRs)
- Minimum Drive Test (MDT) data. MDT consists of logging drawn from consenting users’ phones.
Thingbook’s Galileo data harvesting and cleansing engine was used to read and normalise the very large volumes of data. This part of machine learning is often time-consuming, often taking longer than the actual analysis. Galileo collects, maps aggregates and moves large volumes of data automatically, helping to reduce the amount of time spent in data preparation.
The data was analysed using Thingbook’s Turing machine learning system. Turing used machine learning techniques to conduct real-time analysis of the data. Its capabilities include anomaly detection, pattern matching, probable cause analysis, KPI dependencies and performance ranking.
For anomaly detection, Turing uses an unsupervised outlier detection approach, based on Sub- Space Clustering and Multiple Evidence Accumulation techniques to pinpoint different kinds of network anomalies (more recently, deep learning techniques have been added to the methods used). Turing can recognise patterns which correspond either to newly discovered knowledge or something learned in the past. In either case, Turing has the capability to store and recognise meaningful data behaviours in real-time.
Results
The main results obtained were as follows.
- Cells were classified into six main groups, based on behaviour patterns. The Top 5 of these groups accounted for 51.6% of the cells.
- 1.3% of the studied behaviour was found abnormal or strongly different to the expected learned model.
- Two Cells were repeatedly found in “sleeping/outage” status affecting a total of 28,739 subscribers , who experienced refused connections and poor quality of experience.
- The handset model and OS implemented have a strong influence on the call quality. In other words, some handset models work much better than others. The Call Setup Success Rate and the success rate for setting up a radio channel to carry a call (CS RAB Establishment Success Rate) decrease significantly between the most used model of handset vs. the second most used model.
Conclusion
Machine learning was used to diagnose network faults more effectively than conventional monitoring, by discerning patterns in large volumes of raw log data corresponding to real outages and user experience problems. The problems were thus detected early and addressed promptly.