rough draft complete

2025-09-18 19:42:45 +00:00 · 2022-02-14 21:49:15 -07:00
parent 31725d6b49
commit 7fe5680a67
8 changed files with 357 additions and 28 deletions
--- a/README.md
+++ b/README.md
@@ -1,17 +1,20 @@
 ---
-title: "Machine Learning Methods for Orbital Debris Characterization"
-description: |
-  A short description of the post.
-author:
-  - name: Anson Biggs
-    url: https://ansonbiggs.com
-date: 2022-02-13
-output:
-  distill::distill_article:
-    self_contained: false
-draft: true
+title: "Machine Learning Methods for Orbital Debris Characterization: Report 1"
+# description: |
+#   A short description of the post.
+author: Anson Biggs
+# author:
+# - name: Anson Biggs
+# url: https://ansonbiggs.com
+date: 2022-02-14
+# output:
+#   distill::distill_article:
+#     self_contained: false
+# draft: true
 ---

+## Gathering Data
+
 To get started on the project before any scans of the real debris are made available I opted to find similar 3D models online and to process them as if they were data collected by my team. GrabCad is an excellent source of high quality 3D models and all of the models have at worst a non-commercial license making them suitable for this study. To start I downloaded a high quality model of a 6U CubeSat, which coincidentally enough was designed for detection of orbital debris. This model consists of 48 individual parts most of which are unique.

 ![CubeSat Used for Analysis](assembly.jpg)
@@ -32,6 +35,40 @@ Physical
 ...
 ```

-The full file of the compiled parts properties from Fusion 360 can be seen [here.](https://gitlab.com/orbital-debris-research/fusion-properties/-/blob/main/compiled.csv) This method gave 22 columns of data but most of the columns are unsuitable for characterization of 3D geometry. Its important that the only properties considered are scalars that are independent of a models orientation of position in space. Part of the data provided was a moment of inertia tensor. This was computed down to $I_x$, $I_y$, and $I_z$, which was then used to compute an $\bar{I}$. Then bounding box length, width, and height were used to compute a total volume that the object takes up. In the end the only properties used in the analysis of the parts were: mass, volume, density, area,bounding box volume, $\bar{I}$. Some parts also had to be removed due to being outliers to the final dataset is 44 rows and 6 columns.
+The full file of the compiled parts properties from Fusion 360 can be seen [here.](https://gitlab.com/orbital-debris-research/fusion-properties/-/blob/main/compiled.csv) This method gave 22 columns of data but most of the columns are unsuitable for characterization of 3D geometry. Its important that the only properties considered are scalars that are independent of a models orientation of position in space. Part of the data provided was a moment of inertia tensor. This was computed down to $I_x$, $I_y$, and $I_z$, which was then used to compute an $\bar{I}$. Then bounding box length, width, and height were used to compute a total volume that the object takes up. In the end the only properties used in the analysis of the parts were: mass, volume, density, area,bounding box volume, $\bar{I}$. Some parts also had to be removed due to being outliers to the final dataset is 44 rows and 6 columns. Below is a splom plot which is a great way to visualize data of high dimensions. As you can see most of the properties correlate with one another.

-![Data before PCA](prepped.png)
+![Data before clustering](prepped.svg)
+
+Now that the data is processed and clean characterization in Matlab can begin. The original idea was to perform _PCA_, but the method had difficulties producing meaninful results. This is likely due the dataset being very small for machine learning, and that the variation in the data is high. Application of _PCA_ will be visited again once the dataset grows. The first step for characterization is importing our data into Matlab.
+
+```m
+data = readmatrix('prepped.csv');
+```
+
+Next _k-means_ will be used to cluster the data. Since it is hard to represent data in higher dimensions than two only two columns of data will be provided for the clustering. For now I think it makes most intuitive sense to treat volume and mass as the most important columns, and the volume vs mass plot shows 3 fairly distinct groups.
+
+```m
+[idx,C] = kmeans(data(:,1:2),3);
+```
+
+We can look at the distribution of parts in each cluster to ensure that the each cluster has a good amount of data. Since _k-means_ is an iterative method, relies on a user guess, and randomness its important to make sure that the clusters make some sense.
+
+```m
+histcounts(idx) =
+
+    22    13     9
+```
+
+Then we plotting Volume vs. Mass using our clusters we get the following plot. These make intuitive sense, but it is clear that the dataset needs much more data for <strong style="color:#00ff00;">Cluster 3</strong>.
+
+![Volume and Mass clusters](clusters.svg)
+
+Below is all of the data clustered. Since the _k-means_ only used Mass and Volume to come up with its clusters some of the properties do not cluster well against eachother. This is also a powerful cursory glance at what properties are correlated.
+
+![Clusters for all data](prepped_clustered.svg)
+
+## Next Steps
+
+The current dataset needs to be grown in both the amount of data aswell ass the variety of data. The most glaring issue with the current dataset is that there are only two different material types. Modern satellites, and therefore their debris, are composed of dozens of unique materials. The other and harder to fix issue is finding more properties of the data. Properties such as cross sectional are or aerodynamic drag would be very insightful but at the moment there is no good way to collect that data. Thankfully with the 3D scanner methods to obtain more properties can be developed and then applied over the entire dataset.
+
+Once the dataset is grown more advanced analysis can begin. PCA is the current goal, and can hopefully be applied by the next report.
--- a/README.pdf
+++ b/README.pdf
--- a/clusters.svg
+++ b/clusters.svg
--- a/prep/prep.jl
+++ b/prep/prep.jl
@@ -41,29 +41,24 @@ begin
        df.material_index = mats
    end

-    # Remove columns not needed for analysis
-    # df = df[!, [:mass, :volume, :density, :area, :bb_volume, :Ibar, :material_index]]
-
    # Remove outliers
    df = df[df.box.<1e6, :]
    df = df[df.mass.<1000, :]
 end

 # @df df cornerplot(cols(1:7), compact = true)
+features = [:mass, :volume, :density, :area, :box, :Ibar, :material_index]

-# plot(df.mass)
-# histogram(df.mass)
-# scatter(df.mass, df.Ibar)
-
-features = [:mass, :volume, :density, :area, :box, :Ibar]
-
-plot(df, dimensions = features, kind = "splom", Layout(title = "Raw Data"))
-
-corner(df)
+p1 = plot(df, dimensions = features, kind = "splom", Layout(title = "Raw Data"))

 CSV.write("prepped.csv", df)

+df.cluster = [1, 3, 2, 1, 2, 1, 1, 3, 1, 3, 2, 3, 1, 1, 2, 2, 1, 3, 1, 3, 1, 1, 2, 1, 1, 1, 1, 1, 2, 1, 1, 1, 3, 2, 1, 1, 2, 2, 3, 3, 2, 2, 2, 1,] # From matlab kmeans idx

-df = dataset(DataFrame, "iris")
-features = [:sepal_width, :sepal_length, :petal_width, :petal_length]
-plot(df, dimensions = features, color = :species, kind = "splom")
+
+p2 = plot(df, dimensions = features, color = :cluster, kind = "splom", Layout(title = "Clustered Data"))
+
+
+
+savefig(p1, "prepped.svg", width = 1000, height = 1000)
+savefig(p2, "prepped_clustered.svg", width = 1000, height = 1000)
--- a/prepped.png
+++ b/prepped.png
--- a/prepped.svg
+++ b/prepped.svg
--- a/prepped_clustered.svg
+++ b/prepped_clustered.svg
--- a/process.mlx
+++ b/process.mlx