editing pass

2025-08-02 19:41:29 +00:00 · 2022-04-03 21:46:31 -07:00
parent 12b023df81
commit 8f318a5586
1 changed files with 86 additions and 11 deletions
--- a/README.md
+++ b/README.md
@@ -6,17 +6,48 @@ date: 2022-02-14
 ## Gathering Data
-To get started on the project before any scans of the actual debris are made available, I opted to find similar 3D models online and process them as if they were data collected by my team. GrabCAD is an excellent source of high-quality 3D models, and all of the models have at worst a non-commercial license making them suitable for this study. To start, I downloaded a high-quality model of a 6U CubeSat, which coincidentally enough was designed to detect orbital debris. This model consists of 48 individual parts, most of which are unique.
+To get started on the project before any scans of the actual debris are
 made available, I opted to find 3D models online and process them as if
 they were data collected by my team. GrabCAD is an excellent source of
 high-quality 3D models, and all of the models have, at worst, a
 non-commercial license making them suitable for this study. The current
 dataset uses three separate satellite assemblies found on GrabCAD, below
 is an example of one of the satellites that was used.
 ![Example CubeSat Used for Analysis](Figures/assembly.jpg)
 ## Data Preparation
-The current dataset uses 3 separate satellite assemblies found on GrabCAD and thanks to Blender they were able to be quickly converted to `stl` files giving 108 unique parts to be processed. Since the expected final size of the dataset is expected to be on the magnitude of the thousands an algorith capable of getting the required properties of each part is the only feasible solution. From the analysis performed in [Report 1](https://gitlab.com/orbital-debris-research/directed-study/report-1/-/blob/main/README.md) we know that the most important part of the data is the moments of inertia. Unfortunately this is one of the harder things to calculate from a mesh, but thanks to a paper from David Eberly from 2002 titled [Polyhedral Mass Properties](https://www.geometrictools.com/Documentation/PolyhedralMassProperties.pdf) I was able to replicate his algorithm in the Julia programming language. The current implementation of the algorithm calculates a moment of inertia tensor, volume, and center of gravity in a few milliseconds.
+The models were processed in Blender, which quickly converted the
 assemblies to `stl` files, giving 108 unique parts to be processed. Since
 the expected final size of the dataset is expected to be in the
 magnitude of the thousands, an algorithm capable of getting the required
 properties of each part is the only feasible solution. From the analysis
 performed in [Report 1](https://gitlab.com/orbital-debris-research/directed-study/report-1/-/blob/main/README.md),
 we know that the essential part of the data is the moments of inertia
 which helped narrow down potential algorithms. Unfortunately, this is
 one of the more complicated things to calculate from a mesh, but thanks
 to a paper from David Eberly in 2002 titled [Polyhedral Mass Properties](https://www.geometrictools.com/Documentation/PolyhedralMassProperties.pdf),
 I could replicate his algorithm in the Julia programming language. The
 current implementation of the algorithm calculates a moment of inertia
 tensor, volume, and center of gravity in a few milliseconds per part.
 ![Current Process](Figures/current_process.svg)
-The speed of the algorithm is very important not only for the future for the eventually large number of debris pieces that have to be processed, but many of the data science algorithms we plan on performing on the compiled data need the data to be normalized. For now I have decided that it makes most sense to normalize the dataset based on volume. I chose volume for a few reasons, namely because it was easy to come up with an efficent algorithm to calculate volume, and currently volume seems to be the least important property for analysis of the data. Scaling all the models to have the same volume is thankfully able to be done very efficently using derivative free numerical root finding algorithms. The current implementation is able to scale and process all the properties using only 30% more time than getting the properties without first scaling.
+The algorithm\'s speed is critical not only for the eventually large
 number of debris pieces that have to be processed, but many of the data
 science algorithms we plan on performing on the compiled data need the
 data to be normalized. I have decided that it makes the most sense to
 normalize the dataset based on volume. I chose volume for a few reasons,
 namely because it was easy to come up with an efficient algorithm to
 calculate volume, and currently, volume seems to be the least essential
 property for the data analysis. Scaling all the models to have the same
 volume can be done very efficiently using derivative-free numerical
 root-finding algorithms. The current implementation can scale and
 process all the properties using only 30% more time than getting the
 properties without first scaling. Finding the correct scale is an
 iterative process, so scaling may become significantly more expensive as
 more complex models become available.
 ```txt
 Row │ variable  mean         min           median       max
@@ -31,23 +62,48 @@ The speed of the algorithm is very important not only for the future for the eve
   7 │ Iz         0.0111086    1.05596e-17   2.1906e-8   1.15363
 ```
-Above is a summary of the current dataset without scaling. You may notice that the max values are well above the median and given the small size of the dataset that there are still significant outliers in the dataset. For now any significant outliers will be removed, with more explanation below, but hopefully as the dataset grows this won't become as necessary or shrink the dataset as much. As mentioned before a raw and a normalized dataset were prepared and the data can be found here: [dataset.csv](https://gitlab.com/orbital-debris-research/directed-study/report-2/-/blob/main/dataset.csv), [scaled_dataset.csv](https://gitlab.com/orbital-debris-research/directed-study/report-2/-/blob/main/scaled_dataset.csv)
+Above is a summary of the current dataset without scaling. The max
 values are well above the median, and given the dataset\'s small size,
 there are still significant outliers in the dataset. For now, any
 significant outliers will be removed, with more explanation below, but
 hopefully, this will not become as necessary or shrink the dataset as
 much as the dataset grows. As mentioned before, a raw and a normalized
 dataset were prepared, and the data can be found here:
 [dataset.csv](https://gitlab.com/orbital-debris-research/directed-study/report-2/-/blob/main/dataset.csv), [scaled_dataset.csv](https://gitlab.com/orbital-debris-research/directed-study/report-2/-/blob/main/scaled_dataset.csv)
 ## Characterization
-The first step towards characterization is to perform principal component analysis to determine what properties are the most important. In the past, moments of inertia have been by far the most important for capturing the variation in the data for further analysis but since this dataset is significantly different from the previous dataset it is important to ensure that inertia is still the most important. We begin by using the `pca` function in matlab on our scaled dataset.
+The first step toward characterization is to perform a principal
 component analysis to determine the essential properties. In the past,
 moments of inertia have been the most important for capturing the
 variation in the data. However, since this dataset is significantly
 different from the previous one, it is essential to ensure inertia is
 still the most important. We begin by using the `pca` function in Matlab
 on our scaled dataset.
 ```matlab
 [coeff,score,latent] = pca(scaled_data);
 ```
-We can then put the `coeff` and `score` returned by the `pca` function into a biplot to easily visualize what properties are the most important. Unfortunately we exist in a 3D world so the centers of gravity and moments of inertia have to be analyzed individually.
+We can then put the `coeff` and `score` returned by the `pca` function
 into a biplot to visualize what properties are the most important
 easily. Unfortunately, we exist in a 3D world, so the centers of gravity
 and moments of inertia have to be analyzed individually.
 ![3D BiPlots for PCA](Figures/biplots.png)
-The components of all 6 properties are represented in each of the biplots by the blue lines, and the scores of each property for each part are represented by the red dots. For the current dataset the variation of the data is captured pretty well by both the inertia and the center of gravity. For now I am going to continue using inertia since it performs slightly better here and was clearly the best when it was performed on just a single satellite. As the dataset grows, and the model ingestion pipeline becomes more robust, more time will be spent on analyzing the properties.
+The components of all six properties are represented in each of the
 biplots by the blue lines, and the red dots represent the scores of each
 property for each part. The data variation is captured pretty well for
 the current dataset by both the inertia and the center of gravity. I
 will continue using inertia since it performed slightly better here and
 was the best when it was performed on just a single satellite. As the
 dataset grows and the model ingestion pipeline becomes more robust, more
 time will be spent analyzing the properties.
-Now that it has been determined that inertia will be used k-means clustering can be performed on the raw, unscaled dataset.
+Now that it has been determined that inertia will be used, k-means
 clustering can be performed on the raw, unscaled dataset.
 ```matlab
 [IDX, C] = kmeans(inertia,3);
@@ -58,7 +114,10 @@ histcounts(IDX) % Get the size of each cluster
 ![Scatter of all Data](Figures/first_scatter.png)
-There are 4 pretty distinct groups in this data, with a lot of overlap in the larger groups. To get a better view only the smallest group will be kept and k-means will be performed again to get a better idea of what the data looks like.
+This data has four distinct groups, with much overlap in the larger
 groups. Therefore, to get a better view, only the smallest magnitude
 group will be kept since it seems to have the most variation and k-means
 will be performed again to understand the data better.
 ```matlab
 inertia = inertia(IDX == 1,:);
@@ -70,8 +129,24 @@ histcounts(IDX) % Get the size of each cluster
 ![Scatter of Smallest Group](Figures/final_scatter.png)
-This brings the dataset down to 89 parts from the original 108, and still leaves some very small clusters. This really highlights the need to grow the dataset by around 10x so that hopefully there won't be so many small extremely localized clusters.
+This brings the dataset down to 89 parts from the original 108 and still
 leaves some small clusters. This highlights the need to grow the dataset
 by around 10x so that, hopefully, there will not be so many small,
 highly localized clusters.
 ## Next Steps
-The current dataset needs to be grown in both the amount of data and the variety of data. The most glaring issue with the current dataset is the lack of any debris since the parts are straight from satellite assemblies. Getting accurate properties from the current scans we have is an entire research project in itself so hopefully getting pieces that are easier to scan can help bring the project back on track. The other and harder to fix issue is finding/deriving more data properties. Properties such as cross-sectional are or aerodynamic drag would be very insightful, but is likely to be difficult to implement in code and will be significantly more resource intensive than the current properties the code is able to derive. Before the next report I would like to see this dataset grow closer to one thousand pieces.
+The current dataset needs to be grown in both the amount of data and the
 variety of data. The most glaring issue with the current dataset is the
 lack of any debris since the parts are straight from satellite
 assemblies. Getting accurate properties from the current scans we have
 is an entire research project in itself, so hopefully, getting pieces
 that are easier to scan can help bring the project back on track. The
 other and harder-to-fix issue is finding/deriving more data properties.
 Properties such as cross-sectional or aerodynamic drag would be very
 insightful but are likely to be difficult to implement in code and
 significantly more resource intensive than the current properties the
 code can derive. Characteristic length is being used heavily by NASA
 Debrisat and seems straightforward to implement so that will be the next
 goal for the mesh processing code. Before the next report, I would like
 to see this dataset grow closer to one thousand pieces.