mirror of
https://gitlab.com/orbital-debris-research/directed-study/report-2.git
synced 2025-06-16 15:06:51 +00:00
editing pass
This commit is contained in:
parent
12b023df81
commit
8f318a5586
97
README.md
97
README.md
@ -6,17 +6,48 @@ date: 2022-02-14
|
|||||||
|
|
||||||
## Gathering Data
|
## Gathering Data
|
||||||
|
|
||||||
To get started on the project before any scans of the actual debris are made available, I opted to find similar 3D models online and process them as if they were data collected by my team. GrabCAD is an excellent source of high-quality 3D models, and all of the models have at worst a non-commercial license making them suitable for this study. To start, I downloaded a high-quality model of a 6U CubeSat, which coincidentally enough was designed to detect orbital debris. This model consists of 48 individual parts, most of which are unique.
|
To get started on the project before any scans of the actual debris are
|
||||||
|
made available, I opted to find 3D models online and process them as if
|
||||||
|
they were data collected by my team. GrabCAD is an excellent source of
|
||||||
|
high-quality 3D models, and all of the models have, at worst, a
|
||||||
|
non-commercial license making them suitable for this study. The current
|
||||||
|
dataset uses three separate satellite assemblies found on GrabCAD, below
|
||||||
|
is an example of one of the satellites that was used.
|
||||||
|
|
||||||

|

|
||||||
|
|
||||||
## Data Preparation
|
## Data Preparation
|
||||||
|
|
||||||
The current dataset uses 3 separate satellite assemblies found on GrabCAD and thanks to Blender they were able to be quickly converted to `stl` files giving 108 unique parts to be processed. Since the expected final size of the dataset is expected to be on the magnitude of the thousands an algorith capable of getting the required properties of each part is the only feasible solution. From the analysis performed in [Report 1](https://gitlab.com/orbital-debris-research/directed-study/report-1/-/blob/main/README.md) we know that the most important part of the data is the moments of inertia. Unfortunately this is one of the harder things to calculate from a mesh, but thanks to a paper from David Eberly from 2002 titled [Polyhedral Mass Properties](https://www.geometrictools.com/Documentation/PolyhedralMassProperties.pdf) I was able to replicate his algorithm in the Julia programming language. The current implementation of the algorithm calculates a moment of inertia tensor, volume, and center of gravity in a few milliseconds.
|
The models were processed in Blender, which quickly converted the
|
||||||
|
assemblies to `stl` files, giving 108 unique parts to be processed. Since
|
||||||
|
the expected final size of the dataset is expected to be in the
|
||||||
|
magnitude of the thousands, an algorithm capable of getting the required
|
||||||
|
properties of each part is the only feasible solution. From the analysis
|
||||||
|
performed in [Report 1](https://gitlab.com/orbital-debris-research/directed-study/report-1/-/blob/main/README.md),
|
||||||
|
we know that the essential part of the data is the moments of inertia
|
||||||
|
which helped narrow down potential algorithms. Unfortunately, this is
|
||||||
|
one of the more complicated things to calculate from a mesh, but thanks
|
||||||
|
to a paper from David Eberly in 2002 titled [Polyhedral Mass Properties](https://www.geometrictools.com/Documentation/PolyhedralMassProperties.pdf),
|
||||||
|
I could replicate his algorithm in the Julia programming language. The
|
||||||
|
current implementation of the algorithm calculates a moment of inertia
|
||||||
|
tensor, volume, and center of gravity in a few milliseconds per part.
|
||||||
|
|
||||||

|

|
||||||
|
|
||||||
The speed of the algorithm is very important not only for the future for the eventually large number of debris pieces that have to be processed, but many of the data science algorithms we plan on performing on the compiled data need the data to be normalized. For now I have decided that it makes most sense to normalize the dataset based on volume. I chose volume for a few reasons, namely because it was easy to come up with an efficent algorithm to calculate volume, and currently volume seems to be the least important property for analysis of the data. Scaling all the models to have the same volume is thankfully able to be done very efficently using derivative free numerical root finding algorithms. The current implementation is able to scale and process all the properties using only 30% more time than getting the properties without first scaling.
|
The algorithm\'s speed is critical not only for the eventually large
|
||||||
|
number of debris pieces that have to be processed, but many of the data
|
||||||
|
science algorithms we plan on performing on the compiled data need the
|
||||||
|
data to be normalized. I have decided that it makes the most sense to
|
||||||
|
normalize the dataset based on volume. I chose volume for a few reasons,
|
||||||
|
namely because it was easy to come up with an efficient algorithm to
|
||||||
|
calculate volume, and currently, volume seems to be the least essential
|
||||||
|
property for the data analysis. Scaling all the models to have the same
|
||||||
|
volume can be done very efficiently using derivative-free numerical
|
||||||
|
root-finding algorithms. The current implementation can scale and
|
||||||
|
process all the properties using only 30% more time than getting the
|
||||||
|
properties without first scaling. Finding the correct scale is an
|
||||||
|
iterative process, so scaling may become significantly more expensive as
|
||||||
|
more complex models become available.
|
||||||
|
|
||||||
```txt
|
```txt
|
||||||
Row │ variable mean min median max
|
Row │ variable mean min median max
|
||||||
@ -31,23 +62,48 @@ The speed of the algorithm is very important not only for the future for the eve
|
|||||||
7 │ Iz 0.0111086 1.05596e-17 2.1906e-8 1.15363
|
7 │ Iz 0.0111086 1.05596e-17 2.1906e-8 1.15363
|
||||||
```
|
```
|
||||||
|
|
||||||
Above is a summary of the current dataset without scaling. You may notice that the max values are well above the median and given the small size of the dataset that there are still significant outliers in the dataset. For now any significant outliers will be removed, with more explanation below, but hopefully as the dataset grows this won't become as necessary or shrink the dataset as much. As mentioned before a raw and a normalized dataset were prepared and the data can be found here: [dataset.csv](https://gitlab.com/orbital-debris-research/directed-study/report-2/-/blob/main/dataset.csv), [scaled_dataset.csv](https://gitlab.com/orbital-debris-research/directed-study/report-2/-/blob/main/scaled_dataset.csv)
|
Above is a summary of the current dataset without scaling. The max
|
||||||
|
values are well above the median, and given the dataset\'s small size,
|
||||||
|
there are still significant outliers in the dataset. For now, any
|
||||||
|
significant outliers will be removed, with more explanation below, but
|
||||||
|
hopefully, this will not become as necessary or shrink the dataset as
|
||||||
|
much as the dataset grows. As mentioned before, a raw and a normalized
|
||||||
|
dataset were prepared, and the data can be found here:
|
||||||
|
|
||||||
|
[dataset.csv](https://gitlab.com/orbital-debris-research/directed-study/report-2/-/blob/main/dataset.csv), [scaled_dataset.csv](https://gitlab.com/orbital-debris-research/directed-study/report-2/-/blob/main/scaled_dataset.csv)
|
||||||
|
|
||||||
## Characterization
|
## Characterization
|
||||||
|
|
||||||
The first step towards characterization is to perform principal component analysis to determine what properties are the most important. In the past, moments of inertia have been by far the most important for capturing the variation in the data for further analysis but since this dataset is significantly different from the previous dataset it is important to ensure that inertia is still the most important. We begin by using the `pca` function in matlab on our scaled dataset.
|
The first step toward characterization is to perform a principal
|
||||||
|
component analysis to determine the essential properties. In the past,
|
||||||
|
moments of inertia have been the most important for capturing the
|
||||||
|
variation in the data. However, since this dataset is significantly
|
||||||
|
different from the previous one, it is essential to ensure inertia is
|
||||||
|
still the most important. We begin by using the `pca` function in Matlab
|
||||||
|
on our scaled dataset.
|
||||||
|
|
||||||
```matlab
|
```matlab
|
||||||
[coeff,score,latent] = pca(scaled_data);
|
[coeff,score,latent] = pca(scaled_data);
|
||||||
```
|
```
|
||||||
|
|
||||||
We can then put the `coeff` and `score` returned by the `pca` function into a biplot to easily visualize what properties are the most important. Unfortunately we exist in a 3D world so the centers of gravity and moments of inertia have to be analyzed individually.
|
We can then put the `coeff` and `score` returned by the `pca` function
|
||||||
|
into a biplot to visualize what properties are the most important
|
||||||
|
easily. Unfortunately, we exist in a 3D world, so the centers of gravity
|
||||||
|
and moments of inertia have to be analyzed individually.
|
||||||
|
|
||||||

|

|
||||||
|
|
||||||
The components of all 6 properties are represented in each of the biplots by the blue lines, and the scores of each property for each part are represented by the red dots. For the current dataset the variation of the data is captured pretty well by both the inertia and the center of gravity. For now I am going to continue using inertia since it performs slightly better here and was clearly the best when it was performed on just a single satellite. As the dataset grows, and the model ingestion pipeline becomes more robust, more time will be spent on analyzing the properties.
|
The components of all six properties are represented in each of the
|
||||||
|
biplots by the blue lines, and the red dots represent the scores of each
|
||||||
|
property for each part. The data variation is captured pretty well for
|
||||||
|
the current dataset by both the inertia and the center of gravity. I
|
||||||
|
will continue using inertia since it performed slightly better here and
|
||||||
|
was the best when it was performed on just a single satellite. As the
|
||||||
|
dataset grows and the model ingestion pipeline becomes more robust, more
|
||||||
|
time will be spent analyzing the properties.
|
||||||
|
|
||||||
Now that it has been determined that inertia will be used k-means clustering can be performed on the raw, unscaled dataset.
|
Now that it has been determined that inertia will be used, k-means
|
||||||
|
clustering can be performed on the raw, unscaled dataset.
|
||||||
|
|
||||||
```matlab
|
```matlab
|
||||||
[IDX, C] = kmeans(inertia,3);
|
[IDX, C] = kmeans(inertia,3);
|
||||||
@ -58,7 +114,10 @@ histcounts(IDX) % Get the size of each cluster
|
|||||||
|
|
||||||

|

|
||||||
|
|
||||||
There are 4 pretty distinct groups in this data, with a lot of overlap in the larger groups. To get a better view only the smallest group will be kept and k-means will be performed again to get a better idea of what the data looks like.
|
This data has four distinct groups, with much overlap in the larger
|
||||||
|
groups. Therefore, to get a better view, only the smallest magnitude
|
||||||
|
group will be kept since it seems to have the most variation and k-means
|
||||||
|
will be performed again to understand the data better.
|
||||||
|
|
||||||
```matlab
|
```matlab
|
||||||
inertia = inertia(IDX == 1,:);
|
inertia = inertia(IDX == 1,:);
|
||||||
@ -70,8 +129,24 @@ histcounts(IDX) % Get the size of each cluster
|
|||||||
|
|
||||||

|

|
||||||
|
|
||||||
This brings the dataset down to 89 parts from the original 108, and still leaves some very small clusters. This really highlights the need to grow the dataset by around 10x so that hopefully there won't be so many small extremely localized clusters.
|
This brings the dataset down to 89 parts from the original 108 and still
|
||||||
|
leaves some small clusters. This highlights the need to grow the dataset
|
||||||
|
by around 10x so that, hopefully, there will not be so many small,
|
||||||
|
highly localized clusters.
|
||||||
|
|
||||||
## Next Steps
|
## Next Steps
|
||||||
|
|
||||||
The current dataset needs to be grown in both the amount of data and the variety of data. The most glaring issue with the current dataset is the lack of any debris since the parts are straight from satellite assemblies. Getting accurate properties from the current scans we have is an entire research project in itself so hopefully getting pieces that are easier to scan can help bring the project back on track. The other and harder to fix issue is finding/deriving more data properties. Properties such as cross-sectional are or aerodynamic drag would be very insightful, but is likely to be difficult to implement in code and will be significantly more resource intensive than the current properties the code is able to derive. Before the next report I would like to see this dataset grow closer to one thousand pieces.
|
The current dataset needs to be grown in both the amount of data and the
|
||||||
|
variety of data. The most glaring issue with the current dataset is the
|
||||||
|
lack of any debris since the parts are straight from satellite
|
||||||
|
assemblies. Getting accurate properties from the current scans we have
|
||||||
|
is an entire research project in itself, so hopefully, getting pieces
|
||||||
|
that are easier to scan can help bring the project back on track. The
|
||||||
|
other and harder-to-fix issue is finding/deriving more data properties.
|
||||||
|
Properties such as cross-sectional or aerodynamic drag would be very
|
||||||
|
insightful but are likely to be difficult to implement in code and
|
||||||
|
significantly more resource intensive than the current properties the
|
||||||
|
code can derive. Characteristic length is being used heavily by NASA
|
||||||
|
Debrisat and seems straightforward to implement so that will be the next
|
||||||
|
goal for the mesh processing code. Before the next report, I would like
|
||||||
|
to see this dataset grow closer to one thousand pieces.
|
||||||
|
Loading…
x
Reference in New Issue
Block a user