report update

2025-09-15 01:55:02 +00:00 · 2022-04-14 23:35:28 -07:00
parent 1fd535002f
commit c42549c5e5
11 changed files with 179 additions and 163 deletions
--- a/_posts/2022-04-03-machine-learning-directed-study-report-2/Figures/biplots.png
+++ b/_posts/2022-04-03-machine-learning-directed-study-report-2/Figures/biplots.png
--- a/_posts/2022-04-03-machine-learning-directed-study-report-2/Figures/cg_biplot.png
+++ b/_posts/2022-04-03-machine-learning-directed-study-report-2/Figures/cg_biplot.png
--- a/_posts/2022-04-03-machine-learning-directed-study-report-2/Figures/final_scatter.png
+++ b/_posts/2022-04-03-machine-learning-directed-study-report-2/Figures/final_scatter.png
--- a/_posts/2022-04-03-machine-learning-directed-study-report-2/Figures/first_scatter.png
+++ b/_posts/2022-04-03-machine-learning-directed-study-report-2/Figures/first_scatter.png
--- a/_posts/2022-04-03-machine-learning-directed-study-report-2/Figures/inertia3d.png
+++ b/_posts/2022-04-03-machine-learning-directed-study-report-2/Figures/inertia3d.png
--- a/_posts/2022-04-03-machine-learning-directed-study-report-2/Figures/inertia_biplot.png
+++ b/_posts/2022-04-03-machine-learning-directed-study-report-2/Figures/inertia_biplot.png
--- a/_posts/2022-04-03-machine-learning-directed-study-report-2/Figures/kmeans.png
+++ b/_posts/2022-04-03-machine-learning-directed-study-report-2/Figures/kmeans.png
--- a/_posts/2022-04-03-machine-learning-directed-study-report-2/Figures/pca.png
+++ b/_posts/2022-04-03-machine-learning-directed-study-report-2/Figures/pca.png
--- a/_posts/2022-04-03-machine-learning-directed-study-report-2/citations.bib
+++ b/_posts/2022-04-03-machine-learning-directed-study-report-2/citations.bib
@@ -1,11 +1,29 @@

@misc{eberlyPolyhedralMassProperties2002,
-  title = {Polyhedral {{Mass Properties}} ({{Revisited}})},
-  author = {Eberly, David},
-  year = {2002},
-  month = dec,
+  title     = {Polyhedral {{Mass Properties}} ({{Revisited}})},
+  author    = {Eberly, David},
+  year      = {2002},
+  month     = dec,
  copyright = {CC BY 4.0},
-  url = "https://www.geometrictools.com/Documentation/PolyhedralMassProperties.pdf"
+  url       = {https://www.geometrictools.com/Documentation/PolyhedralMassProperties.pdf}
+}
+
+
+@inproceedings{DebriSat2019,
+  title={Analysis of the DebriSat Fragments and Comparison to the NASA Standard Satellite Breakup Model},
+  author={Murray, James and Cowardin, Heather and Liou, J-C and Sorge, Marlon and Fitz-Coy, Norman and Huynh, Tom},
+  booktitle={International Orbital Debris Conference (IOC)},
+  number={JSC-E-DAA-TN73918},
+  year={2019},
+  url={https://ntrs.nasa.gov/citations/20190034081}
+}
+
+
+@online{interfluo6UCubeSatModel,
+  title = {{{6U CubeSat}} Model | {{3D CAD Model Library}} | {{GrabCAD}}},
+  author = {{Interfluo}},
+  url = {https://grabcad.com/library/6u-cubesat-model-1},
+  urldate = {2022-02-15},
 }


--- a/_posts/2022-04-03-machine-learning-directed-study-report-2/machine-learning-directed-study-report-2.Rmd
+++ b/_posts/2022-04-03-machine-learning-directed-study-report-2/machine-learning-directed-study-report-2.Rmd
@@ -21,150 +21,135 @@ draft: false

 ## Gathering Data

-To get started on the project before any scans of the actual debris are
-made available, I opted to find 3D models online and process them as if
-they were data collected by my team. GrabCAD is an excellent source of
-high-quality 3D models, and all of the models have, at worst, a
-non-commercial license making them suitable for this study. The current
-dataset uses three separate satellite assemblies found on GrabCAD, below
-is an example of one of the satellites that was used.
+To get started on the project before any scans of the actual debris are made available, I opted to
+find 3D models online and process them as if they were data collected by my team. GrabCAD is an
+excellent source of high-quality 3D models, and all the models have, at worst, a non-commercial
+license making them suitable for this study. The current dataset uses three separate satellite
+assemblies found on GrabCAD, below is an example of one of the satellites that was used.

-![Example CubeSat Used for Analysis](Figures/assembly.jpg)
+![Example CubeSat Used for Analysis, @interfluo6UCubeSatModel](Figures/assembly.jpg)

 ## Data Preparation

-The models were processed in Blender, which quickly converted the
-assemblies to `stl` files, giving 108 unique parts to be processed.
-Since the expected final size of the dataset is expected to be in the
-magnitude of the thousands, an algorithm capable of getting the required
-properties of each part is the only feasible solution. From the analysis
-performed in [Report
-1](https://gitlab.com/orbital-debris-research/directed-study/report-1/-/blob/main/README.md),
-we know that the essential debris property is the moments of inertia
-which helped narrow down potential algorithms. Unfortunately, this is
-one of the more complicated things to calculate from a mesh, but thanks
-to a paper from [@eberlyPolyhedralMassProperties2002] titled [Polyhedral
-Mass
-Properties](https://www.geometrictools.com/Documentation/PolyhedralMassProperties.pdf),
-his algorithm was able to be implemented in the Julia programming
-language. The current implementation of the algorithm calculates a
-moment of inertia tensor, volume, and center of gravity in a few
-milliseconds per part.
+The models were processed in Blender, which quickly converted the assemblies to `stl` files, giving
+108 unique parts to be processed. Since the expected final size of the dataset is expected to be in
+the magnitude of the thousands, an algorithm capable of getting the required properties of each part
+is the only feasible solution. From the analysis performed in
+[Report 1](https://gitlab.com/orbital-debris-research/directed-study/report-1/-/blob/main/README.md),
+we know that the essential debris property is the moments of inertia which helped narrow down
+potential algorithms. Unfortunately, this is one of the more complicated things to calculate from a
+mesh, but thanks to a paper from [@eberlyPolyhedralMassProperties2002] titled
+[Polyhedral Mass Properties](https://www.geometrictools.com/Documentation/PolyhedralMassProperties.pdf),
+his algorithm was implemented in the Julia programming language. The current implementation of the
+algorithm calculates a moment of inertia tensor, volume, center of gravity, characteristic length,
+and surface body dimensions in a few milliseconds per part. The library can be found
+[here.](https://gitlab.com/MisterBiggs/stl-process) The characteristic length is a value that is
+heavily used by the NASA DebriSat project [@DebriSat2019] that is doing very similar work to this
+project. The characteristic length takes the maximum orthogonal dimension of a body, sums the
+dimensions then divides by 3 to produce a single scalar value that can be used to get an idea of
+thesize of a 3D object.

-![Current Process](Figures/current_process.svg)
+![Current mesh processing pipeline](Figures/current_process.svg)

-The algorithm's speed is critical not only for the eventually large
-number of debris pieces that have to be processed, but many of the data
-science algorithms we plan on performing on the compiled data need the
-data to be normalized. I have decided that it makes the most sense to
-normalize the dataset based on volume. I chose volume for a few reasons,
-namely because it was easy to implement an efficient algorithm to
-calculate volume, and currently, volume seems to be the least essential
-property for the data analysis. Unfortunately, scaling a model to have a
-specific volume is an iterative process, but can be done very
-efficiently using derivative-free numerical root-finding algorithms. The
-current implementation can scale and process all the properties using
-only 30% more time than getting the properties without first scaling.
+The algorithm's speed is critical not only for the eventual large number of debris pieces that have
+to be processed, but many of the data science algorithms we plan on performing on the compiled data
+need the data to be normalized. For the current dataset and properties, it makes the most sense to
+normalize the dataset based on volume. Volume was chosen for multiple reasons, namely because it was
+easy to implement an efficient algorithm to calculate volume, and currently, volume produces the
+least amount of variation out of the current set of properties calculated. Unfortunately, scaling a
+model to a specific volume is an iterative process, but can be done very efficiently using
+derivative-free numerical root-finding algorithms. The current implementation can scale and process
+all the properties using only 30% more time than getting the properties without first scaling.

-``` {.txt}
- Row │ variable  mean         min           median       max
-     │ Symbol    Float64      Float64       Float64      Float64
-─────┼────────────────────────────────────────────────────────────
-   1 │ volume     0.00977609   1.05875e-10   2.0558e-5   0.893002
-   2 │ cx        -0.836477    -3.13272      -0.00135877  0.0866989
-   3 │ cy        -1.52983     -5.07001      -0.101678    0.177574
-   4 │ cz         0.162855    -6.83716       0.00115068  7.60925
-   5 │ Ix         0.00425039  -5.2943e-7     9.10038e-9  0.445278
-   6 │ Iy         0.0108781    1.05468e-17   1.13704e-8  1.14249
-   7 │ Iz         0.0111086    1.05596e-17   2.1906e-8   1.15363
+```txt
+ Row │ variable               mean      min        median     max
+─────┼───────────────────────────────────────────────────────────────────
+   1 │ surface_area           25.2002   5.60865     13.3338     159.406
+   2 │ characteristic_length  79.5481   0.158521    1.55816     1582.23
+   3 │ sbx                     1.40222  0.0417367   0.967078    10.0663
+   4 │ sby                     3.3367   0.0125824   2.68461     9.68361
+   5 │ sbz                     3.91184  0.29006     1.8185      14.7434
+   6 │ Ix                      1.58725  0.0311782   0.23401     11.1335
+   7 │ Iy                      3.74345  0.178598    1.01592     24.6735
+   8 │ Iz                      5.20207  0.178686    1.742       32.0083
 ```

-Above is a summary of the current 108 part dataset without scaling. The
-max values are well above the median, and given the dataset's small
-size, there are still significant outliers in the dataset. For now, any
-significant outliers will be removed, with more explanation below, but
-hopefully, this will not become as necessary or shrink the dataset as
-much as the dataset grows. As mentioned before, a raw and a normalized
-dataset were prepared, and the data can be found below:
+Above is a summary of the current 108 part with scaling. Since all the volumes are the same it is
+left out of the dataset, the center of gravity is also left out of the dataset since it currently is
+just an artifact of the `stl` file format. There are many ways to determine the 'center' of a 3D
+mesh, but since only one is being implemented at the moment comparisons to other properties doesn't
+make sense. The other notable part of the data is the model is rotated so that the magnitudes of
+`Iz`, `Iy`, and `Ix` are in descending order. This makes sure that the rotation of a model doesn't
+matter for characterization. The dataset is available for download here:

-   [dataset.csv](https://gitlab.com/orbital-debris-research/directed-study/report-2/-/blob/main/dataset.csv)
-   [scaled_dataset.csv](https://gitlab.com/orbital-debris-research/directed-study/report-2/-/blob/main/scaled_dataset.csv)
+- [scaled_dataset.csv](https://gitlab.com/orbital-debris-research/directed-study/report-3/-/blob/main/scaled_dataset.csv)

 ## Characterization

-The first step toward characterization is to perform a principal
-component analysis to determine the essential properties. In the past,
-moments of inertia have been the most important for capturing the
-variation in the data. However, since this dataset is significantly
-different from the previous one, it is essential to ensure inertia is
-still the most important. We begin by using the `pca` function in Matlab
-on our scaled dataset.
+The first step toward characterization is to perform a principal component analysis to determine
+what properties of the data capture the most variation. `PCA` also requires that the data is scaled,
+so as discussed above the dataset that is scaled by `volume` will be used. `PCA` is implemented
+manually instead of the Matlab built-in function as shown below:

-``` {.matlab}
-[coeff,score,latent] = pca(scaled_data);
+```matlab
+% covaraince matrix of data points
+S=cov(scaled_data);
+
+% eigenvalues of S
+eig_vals = eig(S);
+
+% sorting eigenvalues from largest to smallest 
+[lambda, sort_index] = sort(eig_vals,'descend');
+
+
+lambda_ratio = cumsum(lambda) ./ sum(lambda)
 ```

-We can then put the `coeff` and `score` returned by the `pca` function
-into a biplot to visualize what properties are the most important
-easily. Unfortunately, we exist in a 3D world, so the centers of gravity
-and moments of inertia have to be analyzed individually.
+Then plotting `lambda_ratio`, which is the `cumsum`/`sum` produces the following plot:

-![3D BiPlots for PCA](Figures/biplots.png)
+![PCA Plot](Figures/pca.png)

-The components of all six properties are represented in each of the
-biplots by the blue lines, and the red dots represent the scores of each
-property for each part. The data variation is captured pretty well for
-the current dataset by both the inertia and the center of gravity. I
-will continue using inertia since it performed slightly better here and
-was the best when it was performed on just a single satellite. As the
-dataset grows and the model ingestion pipeline becomes more robust, more
-time will be spent analyzing the properties.
+The current dataset can be described incredibly well just by looking at `Iz`, which again the models
+are rotated so that `Iz` is the largest moment of inertia. Then including `Iy` and `Iz` means that a
+3D plot of the principle moments of inertia almost capture all the variation in the data.

-Now that it has been determined that inertia will be used, k-means
-clustering can be performed on the raw, unscaled dataset.
+The next step for characterization is to get only the inertia's from the dataset. Since the current
+dataset is so small, the scaled dataset will be used for rest of the characterization process. Once
+more parts are added to the database it will make sense to start looking at the raw dataset. Now we
+can proceed to cluster the data using the k-means method of clustering. To properly use k-means a
+value of k, which is the number of clusters, needs to be determined. This can be done by creating an
+elbow plot using the following code:

-``` {.matlab}
-[IDX, C] = kmeans(inertia,3);
-
-histcounts(IDX) % Get the size of each cluster
-    89    10     8
+```matlab
+for ii=1:20
+    [idx,~,sumd] = kmeans(inertia,ii);
+    J(ii)=norm(sumd);
+end
 ```

-![Scatter of all Data](Figures/first_scatter.png)
+Which produces the following plot:

-This data has four distinct groups, with much overlap in the larger
-groups. Therefore, to get a better view, only the smallest magnitude
-group will be kept since it seems to have the most variation and k-means
-will be performed again to understand the data better.
+![Elbow method to determine the required number of clusters.](Figures/kmeans.png)

-``` {.matlab}
-inertia = inertia(IDX == 1,:);
-[IDX, C] = kmeans(inertia,3);
+As can be seen in the above elbow plot, at 6 clusters there is an "elbow" which is where there is a
+large drop in the sum distance to the centroid of each cluster which means that it is the optimal
+number of clusters. The inertia's can then be plotted using 6 k-means clusters produces the
+following plot:

-histcounts(IDX) % Get the size of each cluster
-    76     6     7
-```
+![Moments of Inertia plotted with 6 clusters.](Figures/inertia3d.png)

-![Scatter of Smallest Group](Figures/final_scatter.png)
-
-This brings the dataset down to 89 parts from the original 108 and still
-leaves some small clusters. This highlights the need to grow the dataset
-by around 10x so that, hopefully, there will not be so many small,
-highly localized clusters.
+From this plot it is immediately clear that there are clusters of outliers. These are due to the
+different shapes and the extreme values are slender rods or flat plates while the clusters closer to
+the center more closely resemble a sphere. As the dataset grows it should become more apparent what
+kind of clusters actually make up a satellite, and eventually space debris in general.

 ## Next Steps

-The current dataset needs to be grown in both the amount of data and the
-variety of data. The most glaring issue with the current dataset is the
-lack of any debris since the parts are straight from satellite
-assemblies. Getting accurate properties from the current scans we have
-is an entire research project in itself, so hopefully, getting pieces
-that are easier to scan can help bring the project back on track. The
-other and harder-to-fix issue is finding/deriving more data properties.
-Properties such as cross-sectional or aerodynamic drag would be very
-insightful but are likely to be difficult to implement in code and
-significantly more resource intensive than the current properties the
-code can derive. Characteristic length is being used heavily by NASA
-Debrisat and seems straightforward to implement so that will be the next
-goal for the mesh processing code. Before the next report, I would like
-to see this dataset grow closer to one thousand pieces.
+The current dataset needs to be grown in both the amount of data and the variety of data. The most
+glaring issue with the current dataset is the lack of any debris since the parts are straight from
+satellite assemblies. Getting accurate properties from the current scans we have is an entire
+research project in itself, so hopefully, getting pieces that are easier to scan can help bring the
+project back on track. The other and harder-to-fix issue is finding/deriving more data properties.
+Properties such as cross-sectional or aerodynamic drag would be very insightful but are likely to be
+difficult to implement in code and significantly more resource intensive than the current properties
+the code can derive.
--- a/_posts/2022-04-03-machine-learning-directed-study-report-2/machine-learning-directed-study-report-2.html
+++ b/_posts/2022-04-03-machine-learning-directed-study-report-2/machine-learning-directed-study-report-2.html
@@ -132,6 +132,8 @@ code span.wa { color: #5e5e5e; font-style: italic; } /* Warning */
  <!--/radix_placeholder_meta_tags-->
  
  <meta name="citation_reference" content="citation_title=Polyhedral Mass Properties (Revisited);citation_publication_date=2002;citation_author=David Eberly"/>
+  <meta name="citation_reference" content="citation_title=Analysis of the DebriSat fragments and comparison to the NASA standard satellite breakup model;citation_publication_date=2019;citation_author=James Murray;citation_author=Heather Cowardin;citation_author=J-C Liou;citation_author=Marlon Sorge;citation_author=Norman Fitz-Coy;citation_author=Tom Huynh"/>
+  <meta name="citation_reference" content="citation_title=6U CubeSat model | 3D CAD Model Library | GrabCAD;citation_author=6U CubeSat model | 3D CAD Model Library | GrabCAD"/>
  <!--radix_placeholder_rmarkdown_metadata-->

  <script type="text/json" id="radix-rmarkdown-metadata">
@@ -140,7 +142,7 @@ code span.wa { color: #5e5e5e; font-style: italic; } /* Warning */
  <!--/radix_placeholder_rmarkdown_metadata-->
  
  <script type="text/json" id="radix-resource-manifest">
-  {"type":"character","attributes":{},"value":["citations.bib","Figures/assembly.jpg","Figures/biplots.png","Figures/cg_biplot.png","Figures/current_process.svg","Figures/final_scatter.png","Figures/first_scatter.png","Figures/inertia_biplot.png","machine-learning-directed-study-report-2_files/anchor-4.2.2/anchor.min.js","machine-learning-directed-study-report-2_files/bowser-1.9.3/bowser.min.js","machine-learning-directed-study-report-2_files/distill-2.2.21/template.v2.js","machine-learning-directed-study-report-2_files/header-attrs-2.13/header-attrs.js","machine-learning-directed-study-report-2_files/jquery-3.6.0/jquery-3.6.0.js","machine-learning-directed-study-report-2_files/jquery-3.6.0/jquery-3.6.0.min.js","machine-learning-directed-study-report-2_files/jquery-3.6.0/jquery-3.6.0.min.map","machine-learning-directed-study-report-2_files/popper-2.6.0/popper.min.js","machine-learning-directed-study-report-2_files/tippy-6.2.7/tippy-bundle.umd.min.js","machine-learning-directed-study-report-2_files/tippy-6.2.7/tippy-light-border.css","machine-learning-directed-study-report-2_files/tippy-6.2.7/tippy.css","machine-learning-directed-study-report-2_files/tippy-6.2.7/tippy.umd.min.js","machine-learning-directed-study-report-2_files/webcomponents-2.0.0/webcomponents.js"]}
+  {"type":"character","attributes":{},"value":["citations.bib","Figures/assembly.jpg","Figures/current_process.svg","Figures/inertia3d.png","Figures/kmeans.png","Figures/pca.png","machine-learning-directed-study-report-2_files/anchor-4.2.2/anchor.min.js","machine-learning-directed-study-report-2_files/bowser-1.9.3/bowser.min.js","machine-learning-directed-study-report-2_files/distill-2.2.21/template.v2.js","machine-learning-directed-study-report-2_files/header-attrs-2.13/header-attrs.js","machine-learning-directed-study-report-2_files/jquery-3.6.0/jquery-3.6.0.js","machine-learning-directed-study-report-2_files/jquery-3.6.0/jquery-3.6.0.min.js","machine-learning-directed-study-report-2_files/jquery-3.6.0/jquery-3.6.0.min.map","machine-learning-directed-study-report-2_files/popper-2.6.0/popper.min.js","machine-learning-directed-study-report-2_files/tippy-6.2.7/tippy-bundle.umd.min.js","machine-learning-directed-study-report-2_files/tippy-6.2.7/tippy-light-border.css","machine-learning-directed-study-report-2_files/tippy-6.2.7/tippy.css","machine-learning-directed-study-report-2_files/tippy-6.2.7/tippy.umd.min.js","machine-learning-directed-study-report-2_files/webcomponents-2.0.0/webcomponents.js"]}
  </script>
  <!--radix_placeholder_navigation_in_header-->
  <!--/radix_placeholder_navigation_in_header-->
@@ -1511,64 +1513,75 @@ code span.wa { color: #5e5e5e; font-style: italic; } /* Warning */

 <div class="d-article">
 <h2 id="gathering-data">Gathering Data</h2>
-<p>To get started on the project before any scans of the actual debris are made available, I opted to find 3D models online and process them as if they were data collected by my team. GrabCAD is an excellent source of high-quality 3D models, and all of the models have, at worst, a non-commercial license making them suitable for this study. The current dataset uses three separate satellite assemblies found on GrabCAD, below is an example of one of the satellites that was used.</p>
+<p>To get started on the project before any scans of the actual debris are made available, I opted to find 3D models online and process them as if they were data collected by my team. GrabCAD is an excellent source of high-quality 3D models, and all the models have, at worst, a non-commercial license making them suitable for this study. The current dataset uses three separate satellite assemblies found on GrabCAD, below is an example of one of the satellites that was used.</p>
 <figure>
-<img src="Figures/assembly.jpg" alt="Example CubeSat Used for Analysis" /><figcaption aria-hidden="true">Example CubeSat Used for Analysis</figcaption>
+<img src="Figures/assembly.jpg" alt="Example CubeSat Used for Analysis, Interfluo (n.d.)" /><figcaption aria-hidden="true">Example CubeSat Used for Analysis, <span class="citation" data-cites="interfluo6UCubeSatModel">Interfluo (<a href="#ref-interfluo6UCubeSatModel" role="doc-biblioref">n.d.</a>)</span></figcaption>
 </figure>
 <h2 id="data-preparation">Data Preparation</h2>
-<p>The models were processed in Blender, which quickly converted the assemblies to <code>stl</code> files, giving 108 unique parts to be processed. Since the expected final size of the dataset is expected to be in the magnitude of the thousands, an algorithm capable of getting the required properties of each part is the only feasible solution. From the analysis performed in <a href="https://gitlab.com/orbital-debris-research/directed-study/report-1/-/blob/main/README.md">Report 1</a>, we know that the essential debris property is the moments of inertia which helped narrow down potential algorithms. Unfortunately, this is one of the more complicated things to calculate from a mesh, but thanks to a paper from <span class="citation" data-cites="eberlyPolyhedralMassProperties2002">(<a href="#ref-eberlyPolyhedralMassProperties2002" role="doc-biblioref">Eberly 2002</a>)</span> titled <a href="https://www.geometrictools.com/Documentation/PolyhedralMassProperties.pdf">Polyhedral Mass Properties</a>, his algorithm was able to be implemented in the Julia programming language. The current implementation of the algorithm calculates a moment of inertia tensor, volume, and center of gravity in a few milliseconds per part.</p>
+<p>The models were processed in Blender, which quickly converted the assemblies to <code>stl</code> files, giving 108 unique parts to be processed. Since the expected final size of the dataset is expected to be in the magnitude of the thousands, an algorithm capable of getting the required properties of each part is the only feasible solution. From the analysis performed in <a href="https://gitlab.com/orbital-debris-research/directed-study/report-1/-/blob/main/README.md">Report 1</a>, we know that the essential debris property is the moments of inertia which helped narrow down potential algorithms. Unfortunately, this is one of the more complicated things to calculate from a mesh, but thanks to a paper from <span class="citation" data-cites="eberlyPolyhedralMassProperties2002">(<a href="#ref-eberlyPolyhedralMassProperties2002" role="doc-biblioref">Eberly 2002</a>)</span> titled <a href="https://www.geometrictools.com/Documentation/PolyhedralMassProperties.pdf">Polyhedral Mass Properties</a>, his algorithm was implemented in the Julia programming language. The current implementation of the algorithm calculates a moment of inertia tensor, volume, center of gravity, characteristic length, and surface body dimensions in a few milliseconds per part. The library can be found <a href="https://gitlab.com/MisterBiggs/stl-process">here.</a> The characteristic length is a value that is heavily used by the NASA DebriSat project <span class="citation" data-cites="DebriSat2019">(<a href="#ref-DebriSat2019" role="doc-biblioref">Murray et al. 2019</a>)</span> that is doing very similar work to this project. The characteristic length takes the maximum orthogonal dimension of a body, sums the dimensions then divides by 3 to produce a single scalar value that can be used to get an idea of thesize of a 3D object.</p>
 <figure>
-<img src="Figures/current_process.svg" alt="Current Process" /><figcaption aria-hidden="true">Current Process</figcaption>
+<img src="Figures/current_process.svg" alt="Current mesh processing pipeline" /><figcaption aria-hidden="true">Current mesh processing pipeline</figcaption>
 </figure>
-<p>The algorithm’s speed is critical not only for the eventually large number of debris pieces that have to be processed, but many of the data science algorithms we plan on performing on the compiled data need the data to be normalized. I have decided that it makes the most sense to normalize the dataset based on volume. I chose volume for a few reasons, namely because it was easy to implement an efficient algorithm to calculate volume, and currently, volume seems to be the least essential property for the data analysis. Unfortunately, scaling a model to have a specific volume is an iterative process, but can be done very efficiently using derivative-free numerical root-finding algorithms. The current implementation can scale and process all the properties using only 30% more time than getting the properties without first scaling.</p>
-<div class="sourceCode" id="cb1"><pre class="sourceCode txt"><code class="sourceCode default"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true" tabindex="-1"></a> Row │ variable  mean         min           median       max</span>
-<span id="cb1-2"><a href="#cb1-2" aria-hidden="true" tabindex="-1"></a>     │ Symbol    Float64      Float64       Float64      Float64</span>
-<span id="cb1-3"><a href="#cb1-3" aria-hidden="true" tabindex="-1"></a>─────┼────────────────────────────────────────────────────────────</span>
-<span id="cb1-4"><a href="#cb1-4" aria-hidden="true" tabindex="-1"></a>   1 │ volume     0.00977609   1.05875e-10   2.0558e-5   0.893002</span>
-<span id="cb1-5"><a href="#cb1-5" aria-hidden="true" tabindex="-1"></a>   2 │ cx        -0.836477    -3.13272      -0.00135877  0.0866989</span>
-<span id="cb1-6"><a href="#cb1-6" aria-hidden="true" tabindex="-1"></a>   3 │ cy        -1.52983     -5.07001      -0.101678    0.177574</span>
-<span id="cb1-7"><a href="#cb1-7" aria-hidden="true" tabindex="-1"></a>   4 │ cz         0.162855    -6.83716       0.00115068  7.60925</span>
-<span id="cb1-8"><a href="#cb1-8" aria-hidden="true" tabindex="-1"></a>   5 │ Ix         0.00425039  -5.2943e-7     9.10038e-9  0.445278</span>
-<span id="cb1-9"><a href="#cb1-9" aria-hidden="true" tabindex="-1"></a>   6 │ Iy         0.0108781    1.05468e-17   1.13704e-8  1.14249</span>
-<span id="cb1-10"><a href="#cb1-10" aria-hidden="true" tabindex="-1"></a>   7 │ Iz         0.0111086    1.05596e-17   2.1906e-8   1.15363</span></code></pre></div>
-<p>Above is a summary of the current 108 part dataset without scaling. The max values are well above the median, and given the dataset’s small size, there are still significant outliers in the dataset. For now, any significant outliers will be removed, with more explanation below, but hopefully, this will not become as necessary or shrink the dataset as much as the dataset grows. As mentioned before, a raw and a normalized dataset were prepared, and the data can be found below:</p>
+<p>The algorithm’s speed is critical not only for the eventual large number of debris pieces that have to be processed, but many of the data science algorithms we plan on performing on the compiled data need the data to be normalized. For the current dataset and properties, it makes the most sense to normalize the dataset based on volume. Volume was chosen for multiple reasons, namely because it was easy to implement an efficient algorithm to calculate volume, and currently, volume produces the least amount of variation out of the current set of properties calculated. Unfortunately, scaling a model to a specific volume is an iterative process, but can be done very efficiently using derivative-free numerical root-finding algorithms. The current implementation can scale and process all the properties using only 30% more time than getting the properties without first scaling.</p>
+<div class="sourceCode" id="cb1"><pre class="sourceCode txt"><code class="sourceCode default"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true" tabindex="-1"></a> Row │ variable               mean      min        median     max</span>
+<span id="cb1-2"><a href="#cb1-2" aria-hidden="true" tabindex="-1"></a>─────┼───────────────────────────────────────────────────────────────────</span>
+<span id="cb1-3"><a href="#cb1-3" aria-hidden="true" tabindex="-1"></a>   1 │ surface_area           25.2002   5.60865     13.3338     159.406</span>
+<span id="cb1-4"><a href="#cb1-4" aria-hidden="true" tabindex="-1"></a>   2 │ characteristic_length  79.5481   0.158521    1.55816     1582.23</span>
+<span id="cb1-5"><a href="#cb1-5" aria-hidden="true" tabindex="-1"></a>   3 │ sbx                     1.40222  0.0417367   0.967078    10.0663</span>
+<span id="cb1-6"><a href="#cb1-6" aria-hidden="true" tabindex="-1"></a>   4 │ sby                     3.3367   0.0125824   2.68461     9.68361</span>
+<span id="cb1-7"><a href="#cb1-7" aria-hidden="true" tabindex="-1"></a>   5 │ sbz                     3.91184  0.29006     1.8185      14.7434</span>
+<span id="cb1-8"><a href="#cb1-8" aria-hidden="true" tabindex="-1"></a>   6 │ Ix                      1.58725  0.0311782   0.23401     11.1335</span>
+<span id="cb1-9"><a href="#cb1-9" aria-hidden="true" tabindex="-1"></a>   7 │ Iy                      3.74345  0.178598    1.01592     24.6735</span>
+<span id="cb1-10"><a href="#cb1-10" aria-hidden="true" tabindex="-1"></a>   8 │ Iz                      5.20207  0.178686    1.742       32.0083</span></code></pre></div>
+<p>Above is a summary of the current 108 part with scaling. Since all the volumes are the same it is left out of the dataset, the center of gravity is also left out of the dataset since it currently is just an artifact of the <code>stl</code> file format. There are many ways to determine the ‘center’ of a 3D mesh, but since only one is being implemented at the moment comparisons to other properties doesn’t make sense. The other notable part of the data is the model is rotated so that the magnitudes of <code>Iz</code>, <code>Iy</code>, and <code>Ix</code> are in descending order. This makes sure that the rotation of a model doesn’t matter for characterization. The dataset is available for download here:</p>
 <ul>
-<li><a href="https://gitlab.com/orbital-debris-research/directed-study/report-2/-/blob/main/dataset.csv">dataset.csv</a></li>
-<li><a href="https://gitlab.com/orbital-debris-research/directed-study/report-2/-/blob/main/scaled_dataset.csv">scaled_dataset.csv</a></li>
+<li><a href="https://gitlab.com/orbital-debris-research/directed-study/report-3/-/blob/main/scaled_dataset.csv">scaled_dataset.csv</a></li>
 </ul>
 <h2 id="characterization">Characterization</h2>
-<p>The first step toward characterization is to perform a principal component analysis to determine the essential properties. In the past, moments of inertia have been the most important for capturing the variation in the data. However, since this dataset is significantly different from the previous one, it is essential to ensure inertia is still the most important. We begin by using the <code>pca</code> function in Matlab on our scaled dataset.</p>
-<div class="sourceCode" id="cb2"><pre class="sourceCode matlab"><code class="sourceCode matlab"><span id="cb2-1"><a href="#cb2-1" aria-hidden="true" tabindex="-1"></a>[<span class="va">coeff</span><span class="op">,</span><span class="va">score</span><span class="op">,</span><span class="va">latent</span>] <span class="op">=</span> <span class="va">pca</span>(<span class="va">scaled_data</span>)<span class="op">;</span></span></code></pre></div>
-<p>We can then put the <code>coeff</code> and <code>score</code> returned by the <code>pca</code> function into a biplot to visualize what properties are the most important easily. Unfortunately, we exist in a 3D world, so the centers of gravity and moments of inertia have to be analyzed individually.</p>
+<p>The first step toward characterization is to perform a principal component analysis to determine what properties of the data capture the most variation. <code>PCA</code> also requires that the data is scaled, so as discussed above the dataset that is scaled by <code>volume</code> will be used. <code>PCA</code> is implemented manually instead of the Matlab built-in function as shown below:</p>
+<div class="sourceCode" id="cb2"><pre class="sourceCode matlab"><code class="sourceCode matlab"><span id="cb2-1"><a href="#cb2-1" aria-hidden="true" tabindex="-1"></a><span class="co">% covaraince matrix of data points</span></span>
+<span id="cb2-2"><a href="#cb2-2" aria-hidden="true" tabindex="-1"></a><span class="va">S</span><span class="op">=</span><span class="va">cov</span>(<span class="va">scaled_data</span>)<span class="op">;</span></span>
+<span id="cb2-3"><a href="#cb2-3" aria-hidden="true" tabindex="-1"></a></span>
+<span id="cb2-4"><a href="#cb2-4" aria-hidden="true" tabindex="-1"></a><span class="co">% eigenvalues of S</span></span>
+<span id="cb2-5"><a href="#cb2-5" aria-hidden="true" tabindex="-1"></a><span class="va">eig_vals</span> <span class="op">=</span> <span class="va">eig</span>(<span class="va">S</span>)<span class="op">;</span></span>
+<span id="cb2-6"><a href="#cb2-6" aria-hidden="true" tabindex="-1"></a></span>
+<span id="cb2-7"><a href="#cb2-7" aria-hidden="true" tabindex="-1"></a><span class="co">% sorting eigenvalues from largest to smallest </span></span>
+<span id="cb2-8"><a href="#cb2-8" aria-hidden="true" tabindex="-1"></a>[<span class="va">lambda</span><span class="op">,</span> <span class="va">sort_index</span>] <span class="op">=</span> <span class="va">sort</span>(<span class="va">eig_vals</span><span class="op">,</span><span class="ss">&#39;descend&#39;</span>)<span class="op">;</span></span>
+<span id="cb2-9"><a href="#cb2-9" aria-hidden="true" tabindex="-1"></a></span>
+<span id="cb2-10"><a href="#cb2-10" aria-hidden="true" tabindex="-1"></a></span>
+<span id="cb2-11"><a href="#cb2-11" aria-hidden="true" tabindex="-1"></a><span class="va">lambda_ratio</span> <span class="op">=</span> <span class="va">cumsum</span>(<span class="va">lambda</span>) <span class="op">./</span> <span class="va">sum</span>(<span class="va">lambda</span>)</span></code></pre></div>
+<p>Then plotting <code>lambda_ratio</code>, which is the <code>cumsum</code>/<code>sum</code> produces the following plot:</p>
 <figure>
-<img src="Figures/biplots.png" alt="3D BiPlots for PCA" /><figcaption aria-hidden="true">3D BiPlots for PCA</figcaption>
+<img src="Figures/pca.png" alt="PCA Plot" /><figcaption aria-hidden="true">PCA Plot</figcaption>
 </figure>
-<p>The components of all six properties are represented in each of the biplots by the blue lines, and the red dots represent the scores of each property for each part. The data variation is captured pretty well for the current dataset by both the inertia and the center of gravity. I will continue using inertia since it performed slightly better here and was the best when it was performed on just a single satellite. As the dataset grows and the model ingestion pipeline becomes more robust, more time will be spent analyzing the properties.</p>
-<p>Now that it has been determined that inertia will be used, k-means clustering can be performed on the raw, unscaled dataset.</p>
-<div class="sourceCode" id="cb3"><pre class="sourceCode matlab"><code class="sourceCode matlab"><span id="cb3-1"><a href="#cb3-1" aria-hidden="true" tabindex="-1"></a>[<span class="va">IDX</span><span class="op">,</span> <span class="va">C</span>] <span class="op">=</span> <span class="va">kmeans</span>(<span class="va">inertia</span><span class="op">,</span><span class="fl">3</span>)<span class="op">;</span></span>
-<span id="cb3-2"><a href="#cb3-2" aria-hidden="true" tabindex="-1"></a></span>
-<span id="cb3-3"><a href="#cb3-3" aria-hidden="true" tabindex="-1"></a><span class="va">histcounts</span>(<span class="va">IDX</span>) <span class="co">% Get the size of each cluster</span></span>
-<span id="cb3-4"><a href="#cb3-4" aria-hidden="true" tabindex="-1"></a>    <span class="fl">89</span>    <span class="fl">10</span>     <span class="fl">8</span></span></code></pre></div>
+<p>The current dataset can be described incredibly well just by looking at <code>Iz</code>, which again the models are rotated so that <code>Iz</code> is the largest moment of inertia. Then including <code>Iy</code> and <code>Iz</code> means that a 3D plot of the principle moments of inertia almost capture all the variation in the data.</p>
+<p>The next step for characterization is to get only the inertia’s from the dataset. Since the current dataset is so small, the scaled dataset will be used for rest of the characterization process. Once more parts are added to the database it will make sense to start looking at the raw dataset. Now we can proceed to cluster the data using the k-means method of clustering. To properly use k-means a value of k, which is the number of clusters, needs to be determined. This can be done by creating an elbow plot using the following code:</p>
+<div class="sourceCode" id="cb3"><pre class="sourceCode matlab"><code class="sourceCode matlab"><span id="cb3-1"><a href="#cb3-1" aria-hidden="true" tabindex="-1"></a><span class="kw">for</span> <span class="va">ii</span><span class="op">=</span><span class="fl">1</span><span class="op">:</span><span class="fl">20</span></span>
+<span id="cb3-2"><a href="#cb3-2" aria-hidden="true" tabindex="-1"></a>    [<span class="va">idx</span><span class="op">,~,</span><span class="va">sumd</span>] <span class="op">=</span> <span class="va">kmeans</span>(<span class="va">inertia</span><span class="op">,</span><span class="va">ii</span>)<span class="op">;</span></span>
+<span id="cb3-3"><a href="#cb3-3" aria-hidden="true" tabindex="-1"></a>    <span class="va">J</span>(<span class="va">ii</span>)<span class="op">=</span><span class="va">norm</span>(<span class="va">sumd</span>)<span class="op">;</span></span>
+<span id="cb3-4"><a href="#cb3-4" aria-hidden="true" tabindex="-1"></a><span class="kw">end</span></span></code></pre></div>
+<p>Which produces the following plot:</p>
 <figure>
-<img src="Figures/first_scatter.png" alt="Scatter of all Data" /><figcaption aria-hidden="true">Scatter of all Data</figcaption>
+<img src="Figures/kmeans.png" alt="Elbow method to determine the required number of clusters." /><figcaption aria-hidden="true">Elbow method to determine the required number of clusters.</figcaption>
 </figure>
-<p>This data has four distinct groups, with much overlap in the larger groups. Therefore, to get a better view, only the smallest magnitude group will be kept since it seems to have the most variation and k-means will be performed again to understand the data better.</p>
-<div class="sourceCode" id="cb4"><pre class="sourceCode matlab"><code class="sourceCode matlab"><span id="cb4-1"><a href="#cb4-1" aria-hidden="true" tabindex="-1"></a><span class="va">inertia</span> <span class="op">=</span> <span class="va">inertia</span>(<span class="va">IDX</span> <span class="op">==</span> <span class="fl">1</span><span class="op">,:</span>)<span class="op">;</span></span>
-<span id="cb4-2"><a href="#cb4-2" aria-hidden="true" tabindex="-1"></a>[<span class="va">IDX</span><span class="op">,</span> <span class="va">C</span>] <span class="op">=</span> <span class="va">kmeans</span>(<span class="va">inertia</span><span class="op">,</span><span class="fl">3</span>)<span class="op">;</span></span>
-<span id="cb4-3"><a href="#cb4-3" aria-hidden="true" tabindex="-1"></a></span>
-<span id="cb4-4"><a href="#cb4-4" aria-hidden="true" tabindex="-1"></a><span class="va">histcounts</span>(<span class="va">IDX</span>) <span class="co">% Get the size of each cluster</span></span>
-<span id="cb4-5"><a href="#cb4-5" aria-hidden="true" tabindex="-1"></a>    <span class="fl">76</span>     <span class="fl">6</span>     <span class="fl">7</span></span></code></pre></div>
+<p>As can be seen in the above elbow plot, at 6 clusters there is an “elbow” which is where there is a large drop in the sum distance to the centroid of each cluster which means that it is the optimal number of clusters. The inertia’s can then be plotted using 6 k-means clusters produces the following plot:</p>
 <figure>
-<img src="Figures/final_scatter.png" alt="Scatter of Smallest Group" /><figcaption aria-hidden="true">Scatter of Smallest Group</figcaption>
+<img src="Figures/inertia3d.png" alt="Moments of Inertia plotted with 6 clusters." /><figcaption aria-hidden="true">Moments of Inertia plotted with 6 clusters.</figcaption>
 </figure>
-<p>This brings the dataset down to 89 parts from the original 108 and still leaves some small clusters. This highlights the need to grow the dataset by around 10x so that, hopefully, there will not be so many small, highly localized clusters.</p>
+<p>From this plot it is immediately clear that there are clusters of outliers. These are due to the different shapes and the extreme values are slender rods or flat plates while the clusters closer to the center more closely resemble a sphere. As the dataset grows it should become more apparent what kind of clusters actually make up a satellite, and eventually space debris in general.</p>
 <h2 id="next-steps">Next Steps</h2>
-<p>The current dataset needs to be grown in both the amount of data and the variety of data. The most glaring issue with the current dataset is the lack of any debris since the parts are straight from satellite assemblies. Getting accurate properties from the current scans we have is an entire research project in itself, so hopefully, getting pieces that are easier to scan can help bring the project back on track. The other and harder-to-fix issue is finding/deriving more data properties. Properties such as cross-sectional or aerodynamic drag would be very insightful but are likely to be difficult to implement in code and significantly more resource intensive than the current properties the code can derive. Characteristic length is being used heavily by NASA Debrisat and seems straightforward to implement so that will be the next goal for the mesh processing code. Before the next report, I would like to see this dataset grow closer to one thousand pieces.</p>
-<div class="sourceCode" id="cb5"><pre class="sourceCode r distill-force-highlighting-css"><code class="sourceCode r"></code></pre></div>
+<p>The current dataset needs to be grown in both the amount of data and the variety of data. The most glaring issue with the current dataset is the lack of any debris since the parts are straight from satellite assemblies. Getting accurate properties from the current scans we have is an entire research project in itself, so hopefully, getting pieces that are easier to scan can help bring the project back on track. The other and harder-to-fix issue is finding/deriving more data properties. Properties such as cross-sectional or aerodynamic drag would be very insightful but are likely to be difficult to implement in code and significantly more resource intensive than the current properties the code can derive.</p>
+<div class="sourceCode" id="cb4"><pre class="sourceCode r distill-force-highlighting-css"><code class="sourceCode r"></code></pre></div>
 <div id="refs" class="references csl-bib-body hanging-indent" role="doc-bibliography">
 <div id="ref-eberlyPolyhedralMassProperties2002" class="csl-entry" role="doc-biblioentry">
 Eberly, David. 2002. <span>“Polyhedral <span>Mass Properties</span> (<span>Revisited</span>).”</span> <a href="https://www.geometrictools.com/Documentation/PolyhedralMassProperties.pdf">https://www.geometrictools.com/Documentation/PolyhedralMassProperties.pdf</a>.
 </div>
+<div id="ref-interfluo6UCubeSatModel" class="csl-entry" role="doc-biblioentry">
+Interfluo. n.d. <span>“<span>6u CubeSat</span> Model | <span>3d CAD Model Library</span> | <span>GrabCAD</span>.”</span> Accessed February 15, 2022. <a href="https://grabcad.com/library/6u-cubesat-model-1">https://grabcad.com/library/6u-cubesat-model-1</a>.
+</div>
+<div id="ref-DebriSat2019" class="csl-entry" role="doc-biblioentry">
+Murray, James, Heather Cowardin, J-C Liou, Marlon Sorge, Norman Fitz-Coy, and Tom Huynh. 2019. <span>“Analysis of the DebriSat Fragments and Comparison to the NASA Standard Satellite Breakup Model.”</span> In <em>International Orbital Debris Conference (IOC)</em>. JSC-E-DAA-TN73918. <a href="https://ntrs.nasa.gov/citations/20190034081">https://ntrs.nasa.gov/citations/20190034081</a>.
+</div>
 </div>
 <!--radix_placeholder_article_footer-->
 <!--/radix_placeholder_article_footer-->