report rough draft

2025-08-02 03:21:22 +00:00 · 2022-04-03 02:08:46 -07:00
parent f3cb64a2dc
commit 3dc2676854
7 changed files with 111 additions and 65 deletions
--- a/Figures/assembly.jpg
+++ b/Figures/assembly.jpg
--- a/Figures/biplots.png
+++ b/Figures/biplots.png
--- a/Figures/current_process.svg
+++ b/Figures/current_process.svg
--- a/README.md
+++ b/README.md
@@ -1,92 +1,77 @@
-# Report 2
+---
+title: "Machine Learning Methods for Orbital Debris Characterization: Report 2"
+author: Anson Biggs
+date: 2022-02-14
+---

+## Gathering Data

+To get started on the project before any scans of the actual debris are made available, I opted to find similar 3D models online and process them as if they were data collected by my team. GrabCAD is an excellent source of high-quality 3D models, and all of the models have at worst a non-commercial license making them suitable for this study. To start, I downloaded a high-quality model of a 6U CubeSat, which coincidentally enough was designed to detect orbital debris. This model consists of 48 individual parts, most of which are unique.

-## Getting started
+![Example CubeSat Used for Analysis](Figures/assembly.jpg)

-To make it easy for you to get started with GitLab, here's a list of recommended next steps.
+## Data Preparation

-Already a pro? Just edit this README.md and make it your own. Want to make it easy? [Use the template at the bottom](#editing-this-readme)!
+The current dataset uses 3 separate satellite assemblies found on GrabCAD and thanks to Blender they were able to be quickly converted to `stl` files giving 108 unique parts to be processed. Since the expected final size of the dataset is expected to be on the magnitude of the thousands an algorith capable of getting the required properties of each part is the only feasible solution. From the analysis performed in [Report 1](https://gitlab.com/orbital-debris-research/directed-study/report-1/-/blob/main/README.md) we know that the most important part of the data is the moments of inertia. Unfortunately this is one of the harder things to calculate from a mesh, but thanks to a paper from David Eberly from 2002 titled [Polyhedral Mass Properties](https://www.geometrictools.com/Documentation/PolyhedralMassProperties.pdf) I was able to replicate his algorithm in the Julia programming language. The current implementation of the algorithm calculates a moment of inertia tensor, volume, and center of gravity in a few milliseconds.

-## Add your files
+![Current Process](Figures/current_process.svg)

- [ ] [Create](https://docs.gitlab.com/ee/user/project/repository/web_editor.html#create-a-file) or [upload](https://docs.gitlab.com/ee/user/project/repository/web_editor.html#upload-a-file) files
- [ ] [Add files using the command line](https://docs.gitlab.com/ee/gitlab-basics/add-file.html#add-a-file-using-the-command-line) or push an existing Git repository with the following command:
+The speed of the algorithm is very important not only for the future for the eventually large number of debris pieces that have to be processed, but many of the data science algorithms we plan on performing on the compiled data need the data to be normalized. For now I have decided that it makes most sense to normalize the dataset based on volume. I chose volume for a few reasons, namely because it was easy to come up with an efficent algorithm to calculate volume, and currently volume seems to be the least important property for analysis of the data. Scaling all the models to have the same volume is thankfully able to be done very efficently using derivative free numerical root finding algorithms. The current implementation is able to scale and process all the properties using only 30% more time than getting the properties without first scaling.

-```
-cd existing_repo
-git remote add origin https://gitlab.com/orbital-debris-research/directed-study/report-2.git
-git branch -M main
-git push -uf origin main
+```txt
+ Row │ variable  mean         min           median       max
+     │ Symbol    Float64      Float64       Float64      Float64
+─────┼────────────────────────────────────────────────────────────
+   1 │ volume     0.00977609   1.05875e-10   2.0558e-5   0.893002
+   2 │ cx        -0.836477    -3.13272      -0.00135877  0.0866989
+   3 │ cy        -1.52983     -5.07001      -0.101678    0.177574
+   4 │ cz         0.162855    -6.83716       0.00115068  7.60925
+   5 │ Ix         0.00425039  -5.2943e-7     9.10038e-9  0.445278
+   6 │ Iy         0.0108781    1.05468e-17   1.13704e-8  1.14249
+   7 │ Iz         0.0111086    1.05596e-17   2.1906e-8   1.15363
 ```

-## Integrate with your tools
+Above is a summary of the current dataset without scaling. You may notice that the max values are well above the median and given the small size of the dataset that there are still significant outliers in the dataset. For now any significant outliers will be removed, with more explanation below, but hopefully as the dataset grows this won't become as necessary or shrink the dataset as much. As mentioned before a raw and a normalized dataset were prepared and the data can be found here: [dataset.csv](https://gitlab.com/orbital-debris-research/directed-study/report-2/-/blob/main/dataset.csv), [scaled_dataset.csv](https://gitlab.com/orbital-debris-research/directed-study/report-2/-/blob/main/scaled_dataset.csv)

- [ ] [Set up project integrations](https://gitlab.com/orbital-debris-research/directed-study/report-2/-/settings/integrations)
+## Characterization

-## Collaborate with your team
+The first step towards characterization is to perform principal component analysis to determine what properties are the most important. In the past, moments of inertia have been by far the most important for capturing the variation in the data for further analysis but since this dataset is significantly different from the previous dataset it is important to ensure that inertia is still the most important. We begin by using the `pca` function in matlab on our scaled dataset.

- [ ] [Invite team members and collaborators](https://docs.gitlab.com/ee/user/project/members/)
- [ ] [Create a new merge request](https://docs.gitlab.com/ee/user/project/merge_requests/creating_merge_requests.html)
- [ ] [Automatically close issues from merge requests](https://docs.gitlab.com/ee/user/project/issues/managing_issues.html#closing-issues-automatically)
- [ ] [Enable merge request approvals](https://docs.gitlab.com/ee/user/project/merge_requests/approvals/)
- [ ] [Automatically merge when pipeline succeeds](https://docs.gitlab.com/ee/user/project/merge_requests/merge_when_pipeline_succeeds.html)
+```matlab
+[coeff,score,latent] = pca(scaled_data);
+```

-## Test and Deploy
+We can then put the `coeff` and `score` returned by the `pca` function into a biplot to easily visualize what properties are the most important. Unfortunately we exist in a 3D world so the centers of gravity and moments of inertia have to be analyzed individually.

-Use the built-in continuous integration in GitLab.
+![3D BiPlots for PCA](Figures/biplots.png)

- [ ] [Get started with GitLab CI/CD](https://docs.gitlab.com/ee/ci/quick_start/index.html)
- [ ] [Analyze your code for known vulnerabilities with Static Application Security Testing(SAST)](https://docs.gitlab.com/ee/user/application_security/sast/)
- [ ] [Deploy to Kubernetes, Amazon EC2, or Amazon ECS using Auto Deploy](https://docs.gitlab.com/ee/topics/autodevops/requirements.html)
- [ ] [Use pull-based deployments for improved Kubernetes management](https://docs.gitlab.com/ee/user/clusters/agent/)
- [ ] [Set up protected environments](https://docs.gitlab.com/ee/ci/environments/protected_environments.html)
+The components of all 6 properties are represented in each of the biplots by the blue lines, and the scores of each property for each part are represented by the red dots. For the current dataset the variation of the data is captured pretty well by both the inertia and the center of gravity. For now I am going to continue using inertia since it performs slightly better here and was clearly the best when it was performed on just a single satellite. As the dataset grows, and the model ingestion pipeline becomes more robust, more time will be spent on analyzing the properties.

-***
+Now that it has been determined that inertia will be used k-means clustering can be performed on the raw, unscaled dataset.

-# Editing this README
+```matlab
+[IDX, C] = kmeans(inertia,3);

-When you're ready to make this README your own, just edit this file and use the handy template below (or feel free to structure it however you want - this is just a starting point!).  Thank you to [makeareadme.com](https://www.makeareadme.com/) for this template.
+histcounts(IDX) % Get the size of each cluster
+    89    10     8
+```

-## Suggestions for a good README
-Every project is different, so consider which of these sections apply to yours. The sections used in the template are suggestions for most open source projects. Also keep in mind that while a README can be too long and detailed, too long is better than too short. If you think your README is too long, consider utilizing another form of documentation rather than cutting out information.
+![Scatter of all Data](Figures/first_scatter.png)

-## Name
-Choose a self-explaining name for your project.
+There are 4 pretty distinct groups in this data, with a lot of overlap in the larger groups. To get a better view only the smallest group will be kept and k-means will be performed again to get a better idea of what the data looks like.

-## Description
-Let people know what your project can do specifically. Provide context and add a link to any reference visitors might be unfamiliar with. A list of Features or a Background subsection can also be added here. If there are alternatives to your project, this is a good place to list differentiating factors.
+```matlab
+inertia = inertia(IDX == 1,:);
+[IDX, C] = kmeans(inertia,3);

-## Badges
-On some READMEs, you may see small images that convey metadata, such as whether or not all the tests are passing for the project. You can use Shields to add some to your README. Many services also have instructions for adding a badge.
+histcounts(IDX) % Get the size of each cluster
+    76     6     7
+```

-## Visuals
-Depending on what you are making, it can be a good idea to include screenshots or even a video (you'll frequently see GIFs rather than actual videos). Tools like ttygif can help, but check out Asciinema for a more sophisticated method.
+![Scatter of Smallest Group](Figures/final_scatter.png)

-## Installation
-Within a particular ecosystem, there may be a common way of installing things, such as using Yarn, NuGet, or Homebrew. However, consider the possibility that whoever is reading your README is a novice and would like more guidance. Listing specific steps helps remove ambiguity and gets people to using your project as quickly as possible. If it only runs in a specific context like a particular programming language version or operating system or has dependencies that have to be installed manually, also add a Requirements subsection.
+This brings the dataset down to 89 parts from the original 108, and still leaves some very small clusters. This really highlights the need to grow the dataset by around 10x so that hopefully there won't be so many small extremely localized clusters.

-## Usage
-Use examples liberally, and show the expected output if you can. It's helpful to have inline the smallest example of usage that you can demonstrate, while providing links to more sophisticated examples if they are too long to reasonably include in the README.
+## Next Steps

-## Support
-Tell people where they can go to for help. It can be any combination of an issue tracker, a chat room, an email address, etc.
-
-## Roadmap
-If you have ideas for releases in the future, it is a good idea to list them in the README.
-
-## Contributing
-State if you are open to contributions and what your requirements are for accepting them.
-
-For people who want to make changes to your project, it's helpful to have some documentation on how to get started. Perhaps there is a script that they should run or some environment variables that they need to set. Make these steps explicit. These instructions could also be useful to your future self.
-
-You can also document commands to lint the code or run tests. These steps help to ensure high code quality and reduce the likelihood that the changes inadvertently break something. Having instructions for running tests is especially helpful if it requires external setup, such as starting a Selenium server for testing in a browser.
-
-## Authors and acknowledgment
-Show your appreciation to those who have contributed to the project.
-
-## License
-For open source projects, say how it is licensed.
-
-## Project status
-If you have run out of energy or time for your project, put a note at the top of the README saying that development has slowed down or stopped completely. Someone may choose to fork your project or volunteer to step in as a maintainer or owner, allowing your project to keep going. You can also make an explicit request for maintainers.
+The current dataset needs to be grown in both the amount of data and the variety of data. The most glaring issue with the current dataset is the lack of any debris since the parts are straight from satellite assemblies. Getting accurate properties from the current scans we have is an entire research project in itself so hopefully getting pieces that are easier to scan can help bring the project back on track. The other and harder to fix issue is finding/deriving more data properties. Properties such as cross-sectional are or aerodynamic drag would be very insightful, but is likely to be difficult to implement in code and will be significantly more resource intensive than the current properties the code is able to derive. Before the next report I would like to see this dataset grow closer to one thousand pieces.
--- a/characterization.m
+++ b/characterization.m
@@ -0,0 +1,45 @@
+clear all
+rng(sum('anson'))
+colormap cool
+
+% volume,cx,cy,cz,Ix,Iy,Iz
+scaled_data = readmatrix('C:\Coding\report-2\scaled_dataset.csv');
+scaled_data = scaled_data(:,2:end);
+[coeff,score,latent] = pca(scaled_data);
+
+
+biplot(coeff(:,1:3),'scores',score(:,1:3),'varlabels',{'c_x','c_y','c_z','I_x','I_y','I_z'});
+title("Center of Gravity Biplot")
+xlabel("Center of Mass x")
+ylabel("Center of Mass y")
+zlabel("Center of Mass z")
+
+biplot(coeff(:,4:6),'scores',score(:,4:6),'varlabels',{'c_x','c_y','c_z','I_x','I_y','I_z'});
+title("Inertia Biplot")
+xlabel("I_x")
+ylabel("I_y")
+zlabel("I_z")
+
+
+data = readmatrix('C:\Coding\report-2\dataset.csv');
+inertia = data(:,end-2:end);
+inertia = rmoutliers(inertia, 'percentile', [0 99]); % remove single huge outlier
+[IDX, C] = kmeans(inertia,3);
+histcounts(IDX)
+scatter3(inertia(:,1), inertia(:,2), inertia(:,3), [], IDX, 'filled')
+inertia = inertia(IDX == 1,:);
+[IDX, C] = kmeans(inertia,3);
+histcounts(IDX)
+
+scatter3(inertia(IDX == 1,1), inertia(IDX == 1,2), inertia(IDX == 1,3), 'filled')
+hold on
+for i = 2:max(IDX)
+scatter3(inertia(IDX == i,1), inertia(IDX == i,2), inertia(IDX == i,3), 'filled')
+end
+
+legend("Cluster 1", "Cluster 2", "Cluster 3", "Cluster 4")
+
+title("Satellite Parts Moments of Inertia (mm^4)")
+xlabel("I_x")
+ylabel("I_y")
+zlabel("I_z")
--- a/characterization.mlx
+++ b/characterization.mlx
--- a/report.mlx
+++ b/report.mlx