mirror of
https://gitlab.com/orbital-debris-research/directed-study/final-report.git
synced 2025-06-15 14:46:45 +00:00
431 lines
20 KiB
Plaintext
431 lines
20 KiB
Plaintext
---
|
||
title: "Characterization of Space Debris using Machine Learning Methods"
|
||
subtitle: "Advanced processing of 3D meshes using Julia, and data science in Matlab."
|
||
author:
|
||
- name: Anson Biggs
|
||
url: https://ansonbiggs.com
|
||
|
||
date: "4/30/2022"
|
||
thanks: Mehran Andalibi
|
||
|
||
latex-auto-install: true
|
||
format:
|
||
html:
|
||
self-contained: true
|
||
title-block-banner: true
|
||
pdf: default
|
||
|
||
reference-location: margin
|
||
citation-location: margin
|
||
bibliography: citations.bib
|
||
|
||
execute:
|
||
cache: true
|
||
|
||
filters:
|
||
- filter.lua
|
||
# theme: litera
|
||
---
|
||
|
||
## Introduction
|
||
|
||
Orbital debris is a form of pollution that is growing at an exponential pace and puts future space
|
||
infrastructure at risk. Satellites are critical to military, commercial, and civil operations.
|
||
Unfortunately, the space that debris occupies is increasingly becoming more crowded and dangerous,
|
||
potentially leading to a cascade event that could turn orbit around the Earth into an unusable
|
||
wasteland for decades unless proper mitigation is not introduced. Existing models employed by NASA
|
||
rely on a dataset created from 2D images and are missing many crucial features required for
|
||
correctly modeling the space debris environment. This approach aims to use high-resolution 3D
|
||
scanning to fully capture the geometry of a piece of debris and allow a more advanced analysis of
|
||
each piece. Coupled with machine learning methods, the scans will allow advances to the current
|
||
cutting edge. Physical and photograph-based measurements are time-consuming, hard to replicate, and
|
||
lack precision. 3D scanning allows much more advanced and accurate analysis of each debris sample.
|
||
Once the characteristics of space debris are more thoroughly understood, we can begin mitigating the
|
||
creation and danger of future space debris by implementing improved satellite construction methods
|
||
and more advanced debris avoidance measures.
|
||
|
||
## Current Progress
|
||
|
||
This project aims to fix very difficult issues, and although great progress has been made there is
|
||
still plenty of work to be done. Currently, algorithms have been made that are capable of getting
|
||
many key features from solid^[A mesh with a surface that is fully closed and has no holes in its
|
||
geometry.] models that are in the `stl` format. Processing the 3D scans produced by the scanner in
|
||
the RPL^[Rapid Prototyping Lab, STEM Building] is no easy task and something I chose to avoid for
|
||
now. I suspect that the best method to work with data from the scanner will be to export a point
|
||
cloud from the scanner software, then from there use the points to create a triangulated mesh. The
|
||
amount of scans needed for machine learning to capture all possible debris geometry means that the
|
||
only real way to hit the order of magnitude required is to have the only part of the workflow that
|
||
isn't fully automated by code to be taking the scans. My semester planned to do most of the
|
||
processing of the mesh in CATIA which proved to take far too long and was unable to scale.
|
||
|
||
## stlProcess Algorithm Breakdown
|
||
|
||
The algorithm for processing the 3D meshes is implemented in the Julia programming language.
|
||
Syntactically the language is very similar to Python and Matlab, but was chosen because it is as
|
||
performant as compiled languages like C, while still having tooling geared towards engineers and
|
||
scientists. The current code is quite robust and well tested and for a given `stl` file produces the
|
||
following data:
|
||
|
||
```julia
|
||
struct Properties
|
||
# Volume of the mesh
|
||
volume::Float64
|
||
# Center of gravity, meshes are not always centered at [0,0,0]
|
||
center_of_gravity::Vector{Float64}
|
||
# Moment of inertia tensor
|
||
inertia::Matrix{Float64}
|
||
# Surface area of mesh
|
||
surface_area::Float64
|
||
# Average orthogonal dimension of the mesh
|
||
characteristic_length::Float64
|
||
# Projected length of farthest two points in [x,y,z] directions
|
||
solidbody_values::Vector{Float64}
|
||
end
|
||
```
|
||
|
||
### Volume, Center of Gravity, Inertia
|
||
|
||
These properties all come from [@eberlyPolyhedralMassProperties2002] which provides a great
|
||
explanation of the algorithm. The algorithm is quite complicated, but is really robust, accurate and
|
||
vectorized which means it is extremely fast. All the other algorithms are implemented from scratch
|
||
by me and have plenty of room for improvement, but the algorithm for these properties is probably as
|
||
good as it can get.
|
||
|
||
### Surface Area
|
||
|
||
Calculating the surface area is currently very simple but does have some shortcomings. The whole
|
||
algorithm is one line of code:
|
||
|
||
```julia
|
||
surface_area = sum(norm.(eachrow([x0 y0 z0] - [x1 y1 z1]) .× eachrow([x1 y1 z1] - [x2 y2 z2])) / 2)
|
||
```
|
||
|
||
The algorithm is finding the area of a triangle made up by 3 points, $\mathbf{ABC}$, in 3D space
|
||
using Calculus 3 vector math. The area of a triangle is $S=\dfrac{|\mathbf{AB}\times\mathbf{AC}|}2$,
|
||
then the sum of all the triangles that make up the mesh should produce the surface area. This can be
|
||
inaccurate if a mesh is missing triangles, has triangles that overlap, or is a mesh that has
|
||
internal geometry. None of which should be an issue for 3D scans.
|
||
|
||
### Characteristic Length
|
||
|
||
This is a property that is heavily used by the DebriSat^[@DebriSat2019] program, so it is definitely
|
||
worth having specifically since DebriSat is a very interesting partner for this project. It is
|
||
unclear if the current implementation is the same value that DebriSat uses. Wikipedia gives a
|
||
different
|
||
definition^[[Wikipedia: Characteristic Length](https://en.wikipedia.org/wiki/Characteristic_length)]
|
||
that defines characteristic length as the volume divided by the surface area. DebriSat mentions it
|
||
in various papers with slightly different definitions.
|
||
|
||
The current implementation is quite complicated and is the most involved algorithm used. The key
|
||
fact that makes this algorithm work is that the eigenvectors are orthonormal and oriented so that
|
||
they are pointing in the directions of most mass. Assuming the mesh has uniform density, this means
|
||
that the eigenvectors are pointing in the direction of the points furthest from the center of
|
||
gravity.
|
||
|
||
The algorithm begins by taking the eigenvectors $\lambda$, and making vectors from the center of
|
||
gravity to every point that makes up the mesh, $\bar{P_i}$. Then the angle, $\theta$, between the
|
||
eigenvectors and all the point vectors needs to be calculated as follows:
|
||
|
||
$$\theta_i = \text{arccos}\left(\frac{\lambda \cdot P_i}{\| \lambda \| \| P_i \| }\right)$$
|
||
|
||
For almost every mesh there won't be a point that has a direct intersection with the eigenvector,
|
||
i.e. where $\theta \approx 0, \pi$. To deal with this the 3 points with the smallest angle can be
|
||
turned into a plane, then the intersection of the eigenvector and the plane can be calculated^[I
|
||
won't go into depth of this math, but you can find a great explanation here [@plane_line]]. This
|
||
process then needs to be repeated for the largest 3 angles to be able to get two points that a
|
||
distance can be calculated between to be used in the characteristic length. The characteristic
|
||
length, $L_C$, can then be calculated simply by taking the average of the distances found for each
|
||
eigenvector.
|
||
|
||
$$L_c = \frac{\mathbf{A} + \mathbf{B} + \mathbf{C}}{3}$$
|
||
|
||
This algorithm will run into an error if a mesh is flat or 2D, but it has error handling, so it
|
||
fails gracefully. Scans should never have this problem, so it likely isn't worth worrying about.
|
||
There is likely a much faster analytical method to find the intersection, but for now it is
|
||
performant enough.
|
||
|
||
### Solid Body Values
|
||
|
||
The current algorithm just takes all the points that make up the mesh, then finds the `minmax` for
|
||
each axis then averages them like the figure below. No consideration is taken for the orientation of
|
||
the model, but I believe that shouldn't matter. Definitely an algorithm that deserves a revisit.
|
||
|
||
![Calculation of Solid Body Values, @hillMeasurementSatelliteImpact [slide 9]](Figures/solid_body.png)
|
||
|
||
### `find_scale` and `fast_volume` Functions
|
||
|
||
These functions are used together to scale the models for machine learning. When performing machine
|
||
learning on a dataset with properties of varying magnitudes it is important to do some sort of
|
||
scaling. Scaling data can be more of an art than a science, but it was determined that scaling by
|
||
volume was optimal for many reasons, but mainly because the algorithm to calculate the volume is one
|
||
of the fastest. The `fast_volume` function is just a modified version of the main volume function
|
||
that has all the extra calculations required for center of gravity and inertia removed. The
|
||
`find_scale` function uses the `fast_volume` function and a derivative free root finding algorithm
|
||
to find a scaling factor that can be used to get a volume of 1 from a model. This process while
|
||
iterative works incredibly fast and with the current dataset only adds about 20% to the computation
|
||
time to calculate all the properties for a mesh.
|
||
|
||
## stlProcess Workflow
|
||
|
||
Using Julia to process the meshes and make a `csv` to import into Matlab is a relatively
|
||
straightforward process. The hardest part would be to get up and running in Julia, but I'd recommend
|
||
the following resources:
|
||
|
||
- [Julia Getting Started Docs](https://docs.julialang.org/en/v1/manual/getting-started/)
|
||
- [Syntax Cheat sheet](https://cheatsheets.quantecon.org/)
|
||
- [Julia for Matlabbers](https://projects.ansonbiggs.com/posts/2021-08-24-julia-for-matlabbers/)
|
||
|
||
### Imports
|
||
|
||
Unlike Matlab which makes all installed toolboxes available by default, Julia packages have to be
|
||
brought into scope.
|
||
|
||
```{julia}
|
||
# For reading stl files
|
||
using FileIO
|
||
using MeshIO
|
||
|
||
# The custom made library discussed above
|
||
using stlProcess
|
||
|
||
# For making a csv of the data to be imported into matlab
|
||
using CSV
|
||
using DataFrames
|
||
|
||
# For calculating eigenvectors
|
||
using LinearAlgebra
|
||
```
|
||
|
||
### `loop` Setup
|
||
|
||
To set up for the loop that processes all the `stl` files a dataframe has to be initialized, and the
|
||
path to the subfolders containing all the `stl` files needs to be defined.
|
||
|
||
```{julia}
|
||
#| output: false
|
||
|
||
# local path to https://gitlab.com/orbital-debris-research/fake-satellite-dataset
|
||
dataset_path = raw"C:\Coding\fake-satellite-dataset"
|
||
|
||
folders = ["1_5U", "assembly1", "cubesat"]
|
||
|
||
# Columns for every piece of data you want to export
|
||
df = DataFrame(;
|
||
volume=Float64[],
|
||
surface_area=Float64[],
|
||
characteristic_length=Float64[],
|
||
bounding_box=Float64[],
|
||
I1=Float64[],
|
||
I2=Float64[],
|
||
I3=Float64[],
|
||
)
|
||
```
|
||
|
||
### The `loop`
|
||
|
||
The actual loop is quite simple, it first iterates through all the subfolders in the main dataset
|
||
folder, then iterates through all the files in each subfolder. Then the return of the `properties`
|
||
struct which is shown above needs to be processed and added to the dataframe.
|
||
|
||
```{julia}
|
||
#| output: false
|
||
|
||
for path in dataset_path * "\\" .* folders
|
||
Threads.@threads for file in readdir(path)
|
||
|
||
# load the stl
|
||
stl = load(path * "\\" * file)
|
||
|
||
# normalize the stl
|
||
scale = find_scale(stl)
|
||
props = get_mass_properties(stl; scale=scale)
|
||
|
||
I1, I2, I3 = eigvals(props.inertia)
|
||
|
||
sbx, sby, sbz = props.sb_values
|
||
bounding_box = sbx * sby * sbz
|
||
|
||
push!(
|
||
df,
|
||
[props.volume, props.surface_area, props.characteristic_length, bounding_box, I3, I2, I1],
|
||
)
|
||
end
|
||
end
|
||
```
|
||
|
||
### Output
|
||
|
||
A summary of the dataframe can then be generated after the dataset is created. Note the volumes are
|
||
all `1.0`
|
||
|
||
:::{.column-body-outset}
|
||
|
||
```{julia}
|
||
#| echo: false
|
||
|
||
describe(df)
|
||
```
|
||
|
||
:::
|
||
|
||
Finally, the dataset can be saved to a `csv` file for Matlab to import.
|
||
|
||
```julia
|
||
CSV.write("scaled_dataset.csv", df)
|
||
```
|
||
|
||
## Testing
|
||
|
||
The last key part of the Julia library is that it has tests written in the `test` directory. These
|
||
tests take simple shapes that all the above properties can be calculated for easily, and makes sure
|
||
that new versions of the code still compute the values correctly automatically. Test driven design
|
||
is a powerful way to approach programming and is worth exploring since it increases productivity and
|
||
if done correctly proves that code is running correctly.
|
||
|
||
[Julia Testing Documentation](https://docs.julialang.org/en/v1/stdlib/Test/)
|
||
|
||
---
|
||
|
||
## Machine Learning
|
||
|
||
The rest of the document is an in depth look at the progress made characterizing the current fake
|
||
satellite dataset. The summary is that using PCA determined that by far the most variance out of the
|
||
current list of properties is captured by the principle moments of inertia.^[ Eigen Values of the
|
||
inertia tensor. ] I'm including this because it is already typed out and may be a good reference,
|
||
but likely isn't worth digging into once a new dataset of scans can be made.
|
||
|
||
### Gathering Data
|
||
|
||
To get started on the project before any scans of the actual debris are made available, I opted to
|
||
find 3D models online and process them as if they were data collected by my team. GrabCAD is an
|
||
excellent source of high-quality 3D models, and all the models have, at worst, a non-commercial
|
||
license making them suitable for this study. The current dataset uses three separate satellite
|
||
assemblies found on GrabCAD, below is an example of one of the satellites that was used.
|
||
|
||

|
||
|
||
### Data Preparation
|
||
|
||
The models were processed in Blender, which quickly converted the assemblies to `stl` files, giving
|
||
108 unique parts to be processed. Since the expected final size of the dataset is expected to be in
|
||
the magnitude of the thousands, an algorithm capable of getting the required properties of each part
|
||
is the only feasible solution. From the analysis performed in
|
||
[Report 1](https://gitlab.com/orbital-debris-research/directed-study/report-1/-/blob/main/README.md),
|
||
we know that the essential debris property is the moments of inertia which helped narrow down
|
||
potential algorithms. Unfortunately, this is one of the more complicated things to calculate from a
|
||
mesh, but thanks to a paper from [@eberlyPolyhedralMassProperties2002] titled
|
||
[Polyhedral Mass Properties](https://www.geometrictools.com/Documentation/PolyhedralMassProperties.pdf),
|
||
his algorithm was implemented in the Julia programming language. The current implementation of the
|
||
algorithm calculates a moment of inertia tensor, volume, center of gravity, characteristic length,
|
||
and surface body dimensions in a few milliseconds per part. The library can be found
|
||
[here.](https://gitlab.com/MisterBiggs/stl-process) The characteristic length is a value that is
|
||
heavily used by the NASA DebriSat project [@DebriSat2019] that is doing very similar work to this
|
||
project. The characteristic length takes the maximum orthogonal dimension of a body, sums the
|
||
dimensions then divides by 3 to produce a single scalar value that can be used to get an idea of the
|
||
size of a 3D object.
|
||
|
||

|
||
|
||
The algorithm's speed is critical not only for the eventual large number of debris pieces that have
|
||
to be processed, but many of the data science algorithms we plan on performing on the compiled data
|
||
need the data to be normalized. For the current dataset and properties, it makes the most sense to
|
||
normalize the dataset based on volume. Volume was chosen for multiple reasons, namely because it was
|
||
easy to implement an efficient algorithm to calculate volume, and currently, volume produces the
|
||
least amount of variation out of the current set of properties calculated. Unfortunately, scaling a
|
||
model to a specific volume is an iterative process, but can be done very efficiently using
|
||
derivative-free numerical root-finding algorithms. The current implementation can scale and process
|
||
all the properties using only 30% more time than getting the properties without first scaling.
|
||
|
||
```txt
|
||
Row │ variable mean min median max
|
||
─────┼───────────────────────────────────────────────────────────────────
|
||
1 │ surface_area 25.2002 5.60865 13.3338 159.406
|
||
2 │ characteristic_length 79.5481 0.158521 1.55816 1582.23
|
||
3 │ sbx 1.40222 0.0417367 0.967078 10.0663
|
||
4 │ sby 3.3367 0.0125824 2.68461 9.68361
|
||
5 │ sbz 3.91184 0.29006 1.8185 14.7434
|
||
6 │ Ix 1.58725 0.0311782 0.23401 11.1335
|
||
7 │ Iy 3.74345 0.178598 1.01592 24.6735
|
||
8 │ Iz 5.20207 0.178686 1.742 32.0083
|
||
```
|
||
|
||
Above is a summary of the current 108 part with scaling. Since all the volumes are the same it is
|
||
left out of the dataset, the center of gravity is also left out of the dataset since it currently is
|
||
just an artifact of the `stl` file format. There are many ways to determine the 'center' of a 3D
|
||
mesh, but since only one is being implemented at the moment comparisons to other properties doesn't
|
||
make sense. The other notable part of the data is the model is rotated so that the magnitudes of
|
||
`Iz`, `Iy`, and `Ix` are in descending order. This makes sure that the rotation of a model doesn't
|
||
matter for characterization. The dataset is available for download here:
|
||
|
||
- [scaled_dataset.csv](https://gitlab.com/orbital-debris-research/directed-study/report-3/-/blob/main/scaled_dataset.csv)
|
||
|
||
### Characterization
|
||
|
||
The first step toward characterization is to perform a principal component analysis to determine
|
||
what properties of the data capture the most variation. `PCA` also requires that the data is scaled,
|
||
so as discussed above the dataset that is scaled by `volume` will be used. `PCA` is implemented
|
||
manually instead of the Matlab built-in function as shown below:
|
||
|
||
```matlab
|
||
% covaraince matrix of data points
|
||
S=cov(scaled_data);
|
||
|
||
% eigenvalues of S
|
||
eig_vals = eig(S);
|
||
|
||
% sorting eigenvalues from largest to smallest
|
||
[lambda, sort_index] = sort(eig_vals,'descend');
|
||
|
||
|
||
lambda_ratio = cumsum(lambda) ./ sum(lambda)
|
||
```
|
||
|
||
Then plotting `lambda_ratio`, which is the `cumsum ./ sum` produces the following plot:
|
||
|
||

|
||
|
||
The current dataset can be described incredibly well just by looking at `Iz`, which again the models
|
||
are rotated so that `Iz` is the largest moment of inertia. Then including `Iy` and `Iz` means that a
|
||
3D plot of the principle moments of inertia almost capture all the variation in the data.
|
||
|
||
The next step for characterization is to get only the inertia's from the dataset. Since the current
|
||
dataset is so small, the scaled dataset will be used for rest of the characterization process. Once
|
||
more parts are added to the database it will make sense to start looking at the raw dataset. Now we
|
||
can proceed to cluster the data using the k-means method of clustering. To properly use k-means a
|
||
value of k, which is the number of clusters, needs to be determined. This can be done by creating an
|
||
elbow plot using the following code:
|
||
|
||
```matlab
|
||
for ii=1:20
|
||
[idx,~,sumd] = kmeans(inertia,ii);
|
||
J(ii)=norm(sumd);
|
||
end
|
||
```
|
||
|
||
Which produces the following plot:
|
||
|
||

|
||
|
||
As can be seen in the above elbow plot, at 6 clusters there is an "elbow" which is where there is a
|
||
large drop in the sum distance to the centroid of each cluster which means that it is the optimal
|
||
number of clusters. The inertia's can then be plotted using 6 k-means clusters produces the
|
||
following plot:
|
||
|
||

|
||
|
||
From this plot it is immediately clear that there are clusters of outliers. These are due to the
|
||
different shapes and the extreme values are slender rods or flat plates while the clusters closer to
|
||
the center more closely resemble a sphere. As the dataset grows it should become more apparent what
|
||
kind of clusters actually make up a satellite, and eventually space debris in general.
|
||
|
||
### Next Steps
|
||
|
||
The current dataset needs to be grown in both the amount of data and the variety of data. The most
|
||
glaring issue with the current dataset is the lack of any debris scans since the parts are straight
|
||
from satellite assemblies. Getting accurate properties from the current scans we have has proved
|
||
exceedingly difficult, so hopefully, getting pieces that are easier to scan can help bring the
|
||
project back on track. The other and harder-to-fix issue is finding/deriving more data properties.
|
||
Properties such as cross-sectional or aerodynamic drag would be very insightful but are likely to be
|
||
difficult to implement in code and significantly more resource intensive than the current properties
|
||
the code can derive.
|