Episode I: The Phantom Ping

Episode I: The Phantom Ping

Okay team, we have a lot -- and we mean A LOT -- to catch you up on. Because of this, we’re going to release three posts over the next three days. We’re going to dive deep into a few specific areas in these posts:

Episode I: The Phantom Ping

Episode II: Attack of the Calibration

Episode III: Revenge of the Software (and Firmware)

Star Wars puns aside – Episode I (this one!) will cover the completely revamped form of ping detection.

Episode II is going to cover our corrective algorithms: how they came to be where they are today, and the current state of calibration.

Episode III is going to run you through our completely rebuilt software application, our new firmware features, as well as an update on what’s happening with manufacturing and delivery. 

This time around, we thought we’d pass the pen over to the individuals at Mosaic who completed the work in each section. Today’s blog was written by Brennan (Mosaic’s Head of Growth). Brennan joined the team a few months back in a business-focused role, but was thrown head first into ping detection.

Without further ado… let’s get right into it.

Episode I: The Phantom Ping

(Warning: This post is pretty technical. You do not need to read this post to print with a Palette... But we hope you enjoy learning how Palette and its systems work!)

Hi team! I’m Brennan, and I typically help Mosaic on the business side of things. But when we called our all-hands meeting in June (Link), I joined the team that worked to improve pinging & calibration (we’ll get to what that means later). Before getting into pinging, we want to provide some context:

When we think of Palette’s biggest technical challenges, we typically think of them in two buckets. Borrowing a term from McKinsey in the consulting world, we classify challenges by thinking of them in “MECE” terms (Mutually Exclusive, Collectively Exhaustive; pronounced me-see). The idea is that we should be able to classify everything we do on a technical/engineering side into one of these discrete buckets.

There are two primary technical/engineering buckets for Palette: 
1. Splice reliability/consistency (ensuring Palette makes splices within certain specs)
2. Splice timing (ensuring that the right filament shows up at the tip of your printer’s nozzle at the right time)

Many of the challenges we’ve been having with QC and drives (as detailed in these posts - Drive Post 1, Drive Post 2) have been related to the first bucket, splice reliability/consistency. The issues here are typically rooted in physical/hardware properties, and we make changes to hardware (and firmware, to control the hardware) to solve these issues.

On the splice timing side of things, we break this down further into two different systems.

Because we designed Palette to be compatible with as many 1.75mm printers as possible, we knew that it was not feasible to require some sort of digital connection between every 3D printer and Palette. And, as two folks commented on Tested.com’s coverage of Palette, if the splice timing is not within correct tolerances, offsets have the potential to ruin prints:

These two were correct in identifying potential issues, but (understandably) incorrect in believing that Palette cannot work due to these issues. If Palette didn’t have some sort of feedback system, then they’d be right, offsets and printer extrusion inconsistencies would certainly ruin prints. Fortunately, Palette has closed-loop control systems (also known as closed-loop feedback systems).

A closed-loop control system is essentially a system that can read, account for, and correct error. Let’s distill this further. Closed-loop control systems (1) need to know the actual behavior of a system, (2) need to know the desired behavior, and (3) must take action to change the actual behavior to meet the desired behavior.

These are the tasks that Palette’s closed-loop control systems seek to achieve. The systems are called ponging and pinging.

Ponging allows Palette to compensate for error in the amount of filament it produces, compared to the amount of filament it should have produced. (We discussed ponging ["buffer length sensor,” “homing the buffer loop,” “Buffer Measurement System”] in Oct 2015, Nov 2015, Dec 2015.) For ponging, Palette already knows the desired behavior of how much filament it’s supposed to produce based on the information it receives from the .MSF (Mosaic SEEM File, generated by processing G-code in our free & software). Palette’s firmware has the ability to change the actual behavior of the system (“If Palette is trying to produce 1m of filament, is it actually creating 1m of filament, or is it producing 0.99m?”). This system is robust, and has worked well for a long time now -- the last time we had a print failure due to inaccurate or dysfunctional ponging was many months ago.

Pinging is Palette’s closed-loop control system for how much filament Palette should produce based on how much the printer is actually using. (We discussed pinging briefly in Mar 2016, and in our Kickstarter comments in Apr 2016.) Pinging is harder (and more important) to get right because -- as you’re probably aware -- printing is not a perfect science. All sorts of things can cause small differences between estimated and actual filament usage, like:

  • Skipping,
  • Filament diameter inconsistency,
  • Imperfect first-layer adhesion,
  • Moisture,
  • Filament brand/color physical property differences,
  • How often a printer stops/starts when extruding several small segments,
  • And many other factors.

G-code provides a baseline on how much filament printers expect to use, but the combined effects of all these other factors can result in printer extrusion variances that would be unreasonably large if Palette did not have its pinging system.

There are a few really hard problems with pinging. The first part is knowing what the desired and actual states of the system are (which we’ll discuss below). The second part is taking corrective actions so that Palette produces filament that is as close as possible to the desired lengths based on the actual filament consumption of the printer (we’ll discuss this side in tomorrow’s post).

Discovering the desired and actual states of the system

We designed Palette so you would not have to modify your printer (or your printer’s firmware) to print with Palette. This no modifications approach is why Palette has no digital connection to printers. With that said, Palette needs to know how much filament a printer is using to make sure splice timing is optimal. Palette needs to detect a printer’s actual filament usage rates, which it will use to determine the desired filament production (and to adjust accordingly).

Palette measures how much filament the printer is using with its Scroll Wheel (discussed in-depth in Sep 2015Dec 2015; mentioned in May 2015, Oct 2015, Nov 2015). Hundreds, if not thousands of hours have gone into designing and iterating on the Scroll Wheel -- and it is now incredibly robust (and as a result, Palette has a very reliable tool to measure filament movement).

But Palette needs to know more than just how much filament a printer is using. Palette needs to know how much filament a printer has used at certain points in the print, because in isolation, knowing how much filament a printer is using is not enough to understand fully the accuracy and precision of a printer’s rate of filament consumption. It needs a system to say,“hey, we’re 2% of the way through the print, and we’ve used 0.97m of filament, but we should actually have used 1m of filament by now.”

The Scroll Wheel tells us how much filament the printer has actually used, but we need something else that can tell us how much the printer was supposed to use. That’s where pings come in.

Pings are checkpoints throughout the print. Pings are how a printer tells Palette: “Hey, I’m 1.05% of the way through the print. Now I’m at 2.03%... 3.16%... .” At every checkpoint, Palette compares how much filament the printer actually used, and compares it to how much the G-code says it should have used. Palette then uses this information to create filament segments that are likely closer to what a printer actually needs.

Okay, so pings are checkpoints throughout a print. How does a printer ping? How does Palette detect them? Why were they not as reliable for some of Palette’s earliest backers, and what changes have been made to make them more reliable?

Let’s start with what we know: we know how much filament is passing the Scroll Wheel (“SW”) -- either forward or backward, and we can control how frequently this information is evaluated. Palette currently checks filament usage every 0.5 seconds. Each time it checks this information, Palette can run some mathematical functions based on the counts value that the SW has turned during the last 0.5s. (Counts are the SW’s unit of measurement; there are roughly 30 counts per 1 mm.)

We conducted tests across a variety of printers (both direct drive/cartesian and Bowden/delta) to identify the tools we could use to manipulate printers’ behavior in ways that the SW could measure reliably. Based on our tests, we can assume that movement in the SW is caused by loading, extrusion, and retraction. A lack of movement in the SW is caused by pausing (G-code has a universal pause command, G4: Dwell), or from the extruder moving from one point to another on the bed while not extruding. We ultimately decided not to use retraction in pings because of the variances in how reliable and measurable they were across different systems.(Retraction was very reliable and measurable on Bowden systems. However, direct drive systems could not reliably tolerate measurable retractions to justify their use in pings.)

Got it. Pings are checkpoints throughout a print. We know the causes of movement in the SW, and the tools we can manipulate to cause this movement. What’s next?

The next step is to create a ping signature: the series of actions that a printer undertakes at checkpoints throughout the print. The ping signature must be unique enough so the SW can detect pings while minimizing false positive and false negative pings.

A false positive is when something is detected that wasn’t supposed to be detected.

A false negative is when something should have been detected, but is not.

Okay, so a ping signature is basically a series of actions for a printer. How do we instruct printers to execute these actions?

We insert pings by adding G-code commands with our application, Mosaic Standalone. These G-code commands are how the printers know when to ping (and what a ping looks like).

Your current 3D printing process probably looks something like:
Model → Slice → Start Print

The process to 3D print with Palette is currently:
Model → Slice → Process G-code in our Software → Start Palette → Start Print

We automatically add pings during the G-code processing step (don’t worry -- once you’ve set it up, our software is pretty darn easy to use).

Great. That’s all fine and dandy. What did ping signatures look like before? How were we detecting them?

(Warning: incoming math/stats!)

Previously, we didn’t worry too much about what ping signatures looked like. Because of how we had written the algorithm that adjusted filament segment lengths, we operated off the assumption that we didn’t need to pick up that many pings. We thought that as long as Palette picked up even 1 out of 5 pings, if pings were frequent enough, then that would provide sufficient information for Palette to make the necessary filament adjustments within reasonable tolerances. So the ping signature was pretty simple. It used to be: a 3-second pause, ~10mm of extrusion, and then another 3-second pause.

As for how we detected pings: because pings were two pauses with 10mm of extrusion between them, we’d look first for the pauses, and calculate the amount of extrusion between the two pauses to see if a signature was present. To detect pauses, we used to look for no more than a certain amount of movement (between -2 and 5 counts) in the SW for a certain amount of time. Have a look at the figures in the bottom right of the video below from one of our internal tests, and notice how the figures change as the printer moves:

Readings for a standard print are typically around 20 counts per half second. (Note that the color is way off in this print. This test shows that even in small prints, splice timing is way off without pinging.)

Okay, so -- how was Palette detecting pings?  If the conditions were met for a pause measuring at least 2 seconds, then we’d look for ~10mm (~300 counts) of extrusion by looking for 150 to 450 counts of movement in the SW, and then another pause of at least 2 seconds. Then, to minimize false positives (stuff that looks like it could be a ping, but isn’t actually a ping), we added ping gates. If we were expecting a ping at a cumulative sum of 10,000 SW counts, then we would only accept a ping if it were at 10,000 counts plus or minus a certain error. If we met all the conditions (pause, extrusion, pause, all within the ping gates), then we would treat this as a ping.

On most prints, we found this to be sufficient. We’d miss several pings, but so long as we correctly detected a ping every now and then, we thought everything should be okay -- and for the most part, it was. There are a couple reasons we believed this solution was sufficient:

1. We believed that as prints progressed, extrusion was roughly uniform beyond the first layer. We thought that variations in extrusion rates were mostly caused by poor first-layer adhesion, which led us to think: “okay, as long as we catch a few pings after the first few layers, then the prints should be fine.” This hypothesis turned out to be mostly true (statistically that the biggest variances are still in the earliest layers), but we’ve since discovered that there’s enough variation in higher layers of prints to throw them off calibration without more frequent pings. We see variations in extrusion rates of as high as ~1% for parts of the print that are right next to each other. Compare the “Local Offset” figures circled in the image below. (Local Offset figures are essentially the printer’s extrusion rates during specific segments of the print.):

The difference in these two rates is ~1.2%. Extrusion rates varied by ~1-1.5% in a few consecutive portions of this print. If unaccounted for, these variances would show up as incorrect filaments on the print.

2. Because of how we used to determine future optimal filament segment lengths, it was okay to miss a few pings. (We’ve since changed this algorithm to be more accurate, which we’ll discuss tomorrow!)

3. Ping signatures and ping detection are all in software and firmware, so we would continue to improve them through software/firmware updates. But we believed they were robust enough to ship.

What we didn’t realize, is that we were also picking up some false positives (movements that looked like pings but were not actually pings). Have a look at footage from one of our internal tests: (10:52 to 11:03). During this test, despite the printer extruding, there was a small amount of slack in the tube constraining the filament causing the SW readings to be low enough to trigger a pause (using our old filters). We see the same thing here: (5:03 to 5:09 -- note the 1-2-0-1-0-1-0...). Despite the print moving, our old algorithms would have thought this was a pause. But pauses are actually supposed to look like this: (8:35 to 8:43).

In the first and second videos, the print is moving, but using the old pause and ping detection filters, Palette would occasionally pick up false positive pings (looked like a ping, but wasn’t actually a ping) when we got very unlucky (two fake 2+ second pauses, 150 to 450 counts apart from each other, inside the ping gates). False positives have a very high likelihood of ruining prints because they can throw off calibration substantially.

So, what are the implications of the old ping signature and its detection, and what does this mean for new pings?

Well, we’ve changed our algorithms on how to adjust filament lengths -- but these new algorithms rely on more frequent pings. The result of these new algorithms is far more accurate splice timing, but to take advantage of these algorithms, we need more accurate ping detection: fewer false negatives (missed pings), and practically 0 false positives (looks like a ping, but isn’t one).

We’ve also changed how we detect pauses. As we show in the videos above, there were instances of (1) noise during pauses (which caused them to be missed -- false negative pauses), and (2) movement that looked like pauses (false positive pauses). This made us realize that Palette needed cleaner filters for pause detection to account for this noise.

This led to a few changes in how we look at pauses. We separated pauses into two distinct events: a start, and a stop.

  • To minimize false positive pauses (readings that look like pauses are present when they are not), we made the rules to Start pauses more strict.
  • Then, to minimize the likelihood that pauses are incorrectly stopped due to noise, we altered the rules to make it more difficult to Stop pauses (but any actual extrusion will still interrupt pauses).

Overall, these two changes increase the likelihood that pauses detected are actually pauses, but they require pauses to be a bit longer since it takes several consecutive readings within certain boundaries to start a pause. Because of this change, we increased the minimum length of a pause by a few seconds (old pauses used to be 3s, now, the shortest pause is 7s) so that we can minimize the number of false negative (missed) pauses.

We’ve also experimented with adding some more depth to how we think about pauses. Since detecting more pings (and more pauses) is more important now (whereas before, missed pings [and missed pauses] were less critical), it’s become more important to filter out the noise in the SW data.

We explored different ways on how to filter out noise from the data. For example, check out columns E (“SW output”) and J (“Average of last 4 readings”) below:

Column E is the unaltered Scroll Wheel data, and column J is the average of the last 4 readings.

Too small on your screen?  Full size image: link.

Have a look at the blue bars in columns E and J. These represent the magnitude of each figure. You’ll notice that the blue bars in the Average of the last 4 readings column are more regular. That’s because averaging the readings helps flatten the peaks and troughs, which tend to be noise. (If you’d like to see what’s actually happening to the print during these SW readings, check out 3:42 - 3:59. The sudden 3-3-1-1-3-8 is the result of flicking the tube -- we were curious what would happen if someone knocked the Teflon tubes near the SW during a print. The conclusion: In most cases, it should be fine, but try not to mess with the tubes during prints!) We take advantage of these learnings by using the average of the last 4 SW readings in the rules for starting and stopping pauses.

Now, pause detection is much cleaner. Have a look at our pause detection in the simulation:

Note the separate criteria to Start and Stop pauses

Too small on your screen? Full-size image: link.

Here’s the gist of the new pause rules -- starting a pause requires 4 consecutive SW readings between -1 and +1 counts, inclusive: [-1, 1]. Stopping a pause requires the average of the last 4 readings to be outside -2 to +2 counts, and at least 3 consecutive readings that are not between -2 and +2 counts. (Where possible, we designed the simulation to be parametric so we could “goal seek” to find optimal values for these boundaries. Goal seeking is a feature in Excel that iteratively helps find optimal values for a dependent cell by changing another cell. It’s like programmatically changing your m-value to find a specific y-value in y = mx+b.)

The point of all this is that (1) pauses are a little bit longer now, which helps us minimize false positives (looks like a pause but isn’t actually one). We’ve also (2) minimized false negative (missed) pauses, and (3) we can count on the measured duration of each pause to be more precise.

Got it. Palette can more accurately pick up pauses now. How does this fit into everything?

Well, now that we can use the duration of pauses with greater confidence, we’ve designed the new ping signature to take advantage of this.

Because pauses are more reliable, we can rely on the amount of extrusion between pauses with greater confidence as well.

Okay -- so what do ping signatures look like now? How did we come up with this signature? How are we detecting them? How well is pinging working now?

The new ping signature is a 13-second pause, followed by 20mm of extrusion, and then a 7-second pause. Pings will only occur on transition towers (so you don’t need to worry about dripping or oozing on your model).

Now, we realize this adds some additional time to prints. An average 75m print (~12 hours) would include roughly 80 pings, and at 20 seconds of pausing each, that’s 25-30 minutes (~4%) of extra print time for pings. Hopefully, you agree that ~4% additional print time is worthwhile to keep your prints properly calibrated. Over time, we’ll continue to try and tune this to see how we can minimize the time required for pinging without negatively affecting its reliability.

Right, so -- how’d we come up with 13s, 20mm, 7s? Well, this is connected to how we’re detecting pings. Because every component of the ping signature (pause, extrusion, pause) is now more precise and accurate, we’ve moved toward weighted average, confidence-based ping detection.

Our old way of thinking was binary: “Did we pick up a pause, extrusion, and a pause? If yes for all 3, then that’s a ping. Otherwise, No.” Now, we’re analyzing the SW data in a more nuanced way before making our “yes or no” decision. Palette is now thinking: “Was that pause roughly the right duration? Was that extrusion approximately the amount we were expecting? Was that second pause also similar in duration to what we were expecting?”

The idea of moving toward a weighted average/confidence-based system is that we should be able to be much more scientific around ping detection (e.g., X% of pings should score Y% or higher, and there should never be any scores above Z% that are not actual pings).

So, how does this whole weighted average thing work?

Basically, we separate the ping signature’s 3 components and allocate them different weightings. Recall that the components are: pause 1 (13s), extrusion (20mm), and pause 2 (7s). The first pause is more reliable than the second pause (if a pause is recorded as being 1 second too short or long, 1 second matters less for a 13-second pause than it does for a 7-second pause, since 1/13 is a smaller error than 1/7). The extrusion amount is also quite consistent. So we’ve allocated weightings of 35% to the first pause, 35% to the extrusion, and 30% to the second pause (for a total of 100%).

The simulation was also parametric for testing ping signatures. Here are some of the inputs we used for goal seeking:

Parameterizing the ping detection simulation allowed us to simulate ping detection for different ping signatures quickly.

So let’s say a print is running: we’re measuring movement in the SW, then all of a sudden, the printer starts to ping. Since we know that every ping ends in a pause, we only calculate ping scores at the end of each pause.

Our filters pick up a 12-second pause. The pause ends, so Palette checks the ping score, but it’s low. The score is low because the last pause of a ping signature is supposed to be 7 seconds (the ping signature is 13s, 20mm, 7s). Also, the extrusion between the 12-second pause and the last detected pause before it is beyond a certain gate (this gate is well in excess of 20mm).

Okay, the score wasn’t very high, so it is very unlikely that the 12-second pause was the last segment of a ping. Palette will assess this information and say “nope, not a ping.” It will store the 12s pause for later and keep looking forward.

Now Palette sees 583 counts of extrusion (~19mm), followed by a 9-second pause. Again, immediately after the pause ends, Palette calculates the ping score.

To calculate the score for each segment, Palette takes: 1 minus the absolute value of the percentage difference between (1) the actual value measured and (2) the value that Palette was expecting. It looks something like this:

Segment Score = 1 - | (ActualValue - ExpectedValue) / ExpectedValue |

So, if its filters pick up a 12-second pause when Palette was expecting a 13-second pause, Palette will score it as follows:

Pause1 Score = 1 - |(12-13)/13|
Pause1 Score = 1 - |-1/13|
Pause1 Score = 1 - 1/13
Pause1 Score = 0.923 or 92.3% for the first pause.

Since it picked up 583 counts of extrusion, and Palette was expecting 600 counts (20mm * 30 counts/mm), this segment scores: 1 - |(583-600)/600| = 0.972 or 97.2%. Not bad!

And for our 9-second pause? It was actually supposed to be 7s, but something weird happened, and that’s why it clocked in at 9 seconds. That gives us:

Pause2 Score = 1 - |(9-7)/7| = 0.714 or 71.4%.  Hmm -- not great, but not too shabby either.

Overall, this gives us a weighted average ping score of (0.35*0.923)+(0.35*0.972)+(0.3*0.714) =0.877 or 87.7%.

If this score exceeds the threshold to trigger a ping, then Palette will treat this as a ping. The threshold is around 55-60%, so Palette says: “yup, that score is high enough, it’s a ping.”

...and that’s how Palette detects pings!

The result is a far more robust system. Our simulation helped us understand how Palette would score pings with these new filters. Have a look at how the Ping Detection Simulation would treat an average ping signature here: Image. Note that the filters detected a 12.5s pause, 652 counts of extrusion, then a 6.5s pause. This resulted in a score of .938 or 93.8%, which is pretty standard.

Even when pauses are messed up, the scores are still fairly high. In one instance, the second pause was detected as only 4.5s instead of 7s and the resulting ping score was still 86.4%.

In another instance, the first pause was only measured as 9.5s instead of 13s and still scored 86.5%. Even when the first pause was detected as only 7.5s, a ping still achieved a score of79.1%.

With the new algorithms, it’s very common to see score distributions where there’s a big difference between the worst-scoring real ping, and the highest-scoring thing that looks like a ping but isn’t one:

Note the 0.291519 (~30%) difference between the worst-scoring real ping, and the highest-scoring instance that looks like a ping but isn’t.

Too small on your screen? Full size image: link.

False negatives (missed) pings have not completely vanished, but they’re incredibly rare now. False positives are practically unheard of (we haven’t had a single one since implementing the new filters). We’ve had several prints with 50-80+ pings where we correctly pick up every single ping, and even when we do miss a ping (which is rare), our corrective algorithms should account for this, and in most cases, this will not affect the print.

Okay, so what does this all mean?

All this means that pinging is a much more reliable system now -- which means that Palette is far more reliable and consistent for longer prints!

Tomorrow, we’ll cover how Palette actually uses pings to adjust how much filament it’s producing.

TL;DR (Too Long; Didn’t Read) - Summary

In this post, we share how we think about problems. We discuss some reasons why 3D printing is not a perfect science (and what this means for printing with Palette). We share details about Palette’s two closed-loop control systems that help ensure the right filament shows up at the right time. We share reasons why one of these systems, pinging, caused issues for some of our earliest backers. We share how we used math & stats to make pings (checkpoints throughout the print) more reliable, which included new measures to minimize false positives (stuff that looks like a ping but isn’t one), and false negatives (missed pings). Pinging is now working very well, and in tomorrow’s post, we’ll share how Palette uses the information from pings to make corrections in filament length.

That’s all for now!

Please let us know if you have any questions, comments, etc. We’d love to hear from you. Thank you!

-Mosaic.