Sunday, January 8, 2017

Introducing CINTI

"CINTI" is a Competition Data Analytics system I have been working on, for some time.

For about a year and a half, it's been a disjointed collection of data crunching scripts, experiments, math channels in I2Pro and other packages, half formed ideas, and so on. It had finally come together into a package that has some cohesiveness to it. There are few teams/racers/race prep shops, that have been kind enough to agree to work with me in 2017, to apply real world experience to what's being built here. Some of their advice has already been tremendously helpful in focusing the work, and I am very excited about these collaborations, and what we are going to create here.

This particular article is not meant to be any sort of marketing pitch, or a presentation of value CINTI may or may not bring to someone looking to move their data-driven competition to the next level (Ok, that's starting to sound like a sales thing. Sorry!)

Instead, I wanted to talk about some of the motivations and goals behind building this, and to describe  some of the principles that are guiding the development.

Here's a brief preview of early prototypes/mock-ups, for a couple of modules:



Main idea is that CINTI CDA will create a kind of a "funnel" for several common and (usually) important analysis scenarios, with a focus on "actionable insights". This would be accessible to anyone comfortable with selecting things by dragging boxes around them, and clicking one button out of a dozen or so offered. Here are some examples of analysis scenarios:
  • The driver improved his or her best time at Portland International, by 1.5 seconds.
    • Was it the engine tuning change? Drafting another car on the straight, better grip from the tires? Maybe driver coaching during the test day made the real difference here?
    • If it was some combination of the above, what proportion of the improvement came from each specific area?
    • Did tuning change actually slow down the car, but fresh tires and higher corner speeds made up for it, and then some?
    • Can this gain be repeated/taken further? If yes, how? If no, why?
  • The data from 2 cars  and 2 drivers is compared. Both post similar performance in terms of lap times
    • Do cars and drivers perform similarly in all aspects, or are there advantages and shortcomings that balance out?
    • Can we learn something from such comparison, to make both cars (and drivers) faster?
If you have done this sort of analysis, I bet your experience was probably along one of these 3 scenarios:

1. You already had some knowledge about the car and the driver involved. You coached quite a few drivers, and/or set up quite a few cars. Maybe you run arrive-and-drive business, or you belong to a group of spec car racers who are willing to talk to each other in between filing all the protests.  You had pretty good guess about what could had been going on, and you were able to look at one of the pre-configured views to confirm or disprove it quickly. That's great. The only problem, is that if your guess was correct, and you confirmed it, you may had missed another 3 things that were going on. If your guess was not correct, and your ego is somewhat flexible (unusual but not impossible), you are moving on to the Scenario 2 below. 
Missing important secondary patterns/trends is probably the most common pitfall here, especially with analysis methods that over-focus on differences in speed traces and section times. Yes, sure, the car was over-slowed for turn 4. But WHY? Yes, the driver went on (or off) the brakes too early, but WHY? Yes, the brakes were used efficiently in all other corners, but turn 4, but WHY? Was the difference in braking the MAIN reason the car was over-slowed? Could the car REALLY carry more speed through turn 4, on that day, in that session, on that lap - and if it could - if the driver could - WHY didn't he (or she)*?
You can see the risk of prematurely separating the analysis from the data, and moving into the area of guesses, intuition, and personal expertise. Not a problem, if you have been doing it for years, and you have managed to combine accurate judgement with flexible ego (rare, but not impossible). What if that's not the case?

2. You open up your analysis software and start piling traces onto each other, looking for patterns that could be related to your question (That's how I started out - bad old days...) If you are lucky, and the software compatible with your logger allows you to lay stuff out efficiently, and mix/match different types of data on the same screen - without things looking like a bowl of spaghetti thrown against the wall - you may even learn something, before you have to pack up your efforts for the night, or your brain starts to hurt - or you have to pack up because your brain started to hurt. Or, you are flinging your food against the wall, in frustration.
In my experience, true "exploratory" analysis is utterly incompatible with "out of the box" analysis software out there. To the point where I would not bother with it anymore, unless I know I have at least 3-4 hours to spend on 1 days' worth of data, and that's for a single car/driver. Or I am getting paid very, very well for my frustrations.

In any other industry, this situation would be silly to unacceptable, and would lead to bankruptcies and other kinds of shame and failure, eventually resulting in progress. For some strange reason, in racing, it is the norm to require such enormous investment of time and expertise, just to answer a series of simple questions.

3. You are professional (or aspiring) race engineer, you have worked with all types of data out there, you have a library of math channels you developed to make exploratory analysis efficient, and you would not work with a team until they install a dozen of sensors on the car, so every critical input and/or response is logged at the right sampling rate. 
Life is pretty good. This is where I thought I was going with this.
Life is good - but only until the pulley on the steering sensor comes loose, the fuse on the power supply to suspension sensors' hub has burned out, and mechanics forgot to plug in the harness for brake pressure senders when they fixed seeping brake line. In addition, no useful information can be extracted from the driver, even after "special measures" had been applied.
OK. Let's see you figure out cornering balance and braking efficiency now, Mr. Engineer!
If you really do this seriously, you know that even the basic accelerometer and speed/position signals have information about those things**. If you are like me, you may had even tried to create "math channels" to extract that, and, inevitably, ran up against the limits of the analysis software - which is why you demand space shuttle levels of instrumentation.

Let me put it this way - if it is possible to look at the speed trace, and 2 accelerometer traces, and to make really good guess about the car, say,  understeering mid-corner (I highly recommend "Data Power" book by Buddy Fey, if you do not believe me on this) - if that is possible, then why can't your basic data analysis software do this today? Oh sure. You can pay MoTeC nearly $1000 for "Pro" version of I2, and install steering angle sensor, and then use the "Oversteer" channel. The results are underwhelming at best, and confusing at worst, until you've used that for a while... And written yet another math channel to calibrate/clean up the output, and set up just the right view. And the sensors get zeroed out. And then -  you have to use someone else's laptop one day -  and... oops. How do I do this again?

The popular software packages used for this have been on the market for 20+ years - why can't they do this? The book I mentioned - It was published before I was born! Today, we have software that can sequence genome and find planets based on nothing but their gravitational effects on things light-years away from us, we can predict the next thing someone will impulsively buy on Amazon, but we need the damn steering angle sensor signal, and 3 math channels, and then another special one for rain conditions, to confirm that the car is unstable in transition after turn-in?

So, my goal is to start moving away from all 3 of these "modes" of analysis, while still having an option for "Expert Exploration".

How?

There are 2 parts to that. First: automating how the information is extracted from the data. Information is something you understand, something you want to know, something you can choose to act upon. If it's there and then it disappears, you are not happy.
Data is data. It's just... there. It's like old paperwork in the box, in the attic. Do you need it? Maybe. Is it worth keeping it around? Maybe. Is it useful now? No. Will it be useful at some point? Maybe. Are you going to be really upset if it disappears? Probably not?

Separating the two has been the focus of R&D done so far, and the results have been simply stunning - more on that, including some demo videos, in a minute.  Second: presenting the information in a way that simplifies and encourages actionable insights, instead of making them nearly impossible, without years of experience - and a frustrating process even once you've accumulated that experience. I will talk about that part another time - some research is still ongoing in that area.

For most of the analysis scenarios, similar to what I listed in the beginning, my goal is something like 10 minutes to answer  a question (once the data is loaded and initial automated processing completes).
How would this me accomplished? Here's how. I declare the War On Math! Read. My. Lips. No. More. Math.

OK. To be fair. I like math, and it's the core of CINTI (not to mention that math is the very basis of our  universe, and our reality at large).

But - to require that racers, driver coaches, and race engineers use math as a critical tool/skill to extract information from data - that is ridiculous, especially in this day and age. This is EXACTLY like requiring one to be able to solder together microprocessor boards - just to use a smartphone.  Not to fix or to modify it -  just to turn it on and to check the email!
Another analogy - you know how you have to adjust your fuel mixture before you start your street car first thing in the morning, and how sometimes you forget to pull the cable to close the choke after it warms up, as you drive to work? NO? You don't have to do those things? Hmm. That's interesting. Ever wonder why these things got automated?
I really need to drive this point home, so bear with me for the rest of this rant.
I do not know the resistor/capacitor values that are needed to build a volume knob that results in a good range and does not introduce noise. Yet, I can adjust the volume of my radio. I am good at it!
I have no idea how to sort out liquid crystal matrices to end up with large sheets with no defects - yet my TV and my computer monitors have no dead pixels.  I do not have a working knowledge of how internal registry and caching structures of popular CPUs on the market are configured, yet I can write software that will work on ANY of them, with predictable efficiency. I can make tea or coffee, yet I have very basic understanding of chemistry. I can turn on the lights in my bathroom, without having to figure out power and voltage values involved.

So, can any one explain to me, why "racing data software" cannot  tell me if the car is understeering mid-corner, has a stability issue at turn in, or if the driver is disrupting the traction balance by the brake application, without me becoming an expert in math and automotive physics, mastering arcane (and often deficient) math channel "languages", not to mention accumulating years of experience driving, fixing, tuning, and building race cars? Anyone?


Here's what's under CINTI's hood. This is a part of the development "sandbox", where I create software that extracts information from data - in this case, extracting the point at which the car is turned into the corner, plus a measure of how quickly the cornering force builds up. To be fair, it is more complicated problem than it seems at the first glance, given varieties of corners,  cars, tires, and driving styles. In fact, almost insurmountable for most "racing data analysis" packages out there.

Yet, it is easily solved with the right tools and technologies. This is relatively basic algorithm - it can (and will) be made more accurate (by orders of magnitude), as time goes on. Already, it's approaching accuracy of reasonably competent human reviewing the data, while being tens of million times faster (in the video, the process is greatly slowed down artificially). Remember, the machine does not need to be more accurate in every case, when it can analyze more data in a few  days, than all race engineers in history of racing combined, in their lifetimes.


So, that's great - but this is "under the hood" view. What if I need to change - tune - how this works for a particular car or track. We are back to math, right?

I don't think this is necessary. Why not tune your Analysis Channels (CINTI has no "Math" Channels, of course, only Analysis Channels) same way you would tune your tire pressures? Make an adjustment - and get feedback about how it works! Keep adjusting until it works well, and you are done!

This is a prototype interface for this, it will get cleaned up with time. For now, I use it to actually develop analysis logic, and so on  - but in the future, this would be a way for "advanced/pro" user to customize it, to improve effectiveness of automated/"instant" analysis, or perhaps, even to "teach" the system to recognize and process new things.

You can see me adjusting one out of 3 parameters, to get CINTI to improve how it recognizes driver initiating corner entry for Turn 2 at Thunderhill. Once that has been done, it will recognize that point, apply changes to all other corners (and update all calculations that use it) for thousands, tens of thousands, whatever number of laps - no more tinkering required.
Again, to be clear - this is not analysis per se, but "tuning" the analysis by changing one out of 3 available parameters, to get the result reflecting the expectations:



By the way, other colored boxes with numbers in them are other Analysis Channels. All of those are created and updated automatically, and contribute to higher-level analysis and "insights".

Here's an example where this data is actually used. This is comparison of 3 drivers, covering about 150 laps (ever tried to look through 150 laps' worth of data for a single corner?). Vertical grouping is based on drivers. Color of the points represents section times (lighter = faster). X (horizontal) axis is the lap distance point at which the car was turned in (we just tuned that).

You can see that there are 2 important patterns here:
1. Driver 3 is consistently initiating the turn earlier than other drivers (the points are shifted to the left of the chart) AND there is a specific range (circled in white) of turn-in points that seems to be closely related to faster section times. Turning in even earlier than that (yellow circle) does not produce faster times.
2. Drivers 1 and 2 have similar turn-in points, but Driver 2 has faster times, on average (more lighter-colored dots). The difference is likely due to something other than this particular aspect.

Switching X axis to another view confirms this - there is a significant difference in Min. Speed for that corner, explaining the difference between Driver 1 and Driver 2's times:



Again, this is not the final "analysis" UI, nor does this represent one of those "analysis funnels",  just a demonstration of how "Analysis Channels" are used in higher level calculations/processing.

To be absolutely clear - all of the statistics/measurements/groupings in this example are calculated, processed, and aggregated automatically, there are no "math" channels. If you need to change how CINTI detects "corner entry", there's an adjustment for that  - see the last video - but everything else is done automatically.

Igor Levine
Momentary Racing

*"The onion must be peeled even if it's making you cry."

** To be clear, I am not saying that having extensive instrumentation has no purpose. Serious point here is that mainstream analysis software flounders UNTIL extensive instrumentation is in place - and even then, it has some serious issues separating signal from noise, and information from data.