1
0
mirror of https://gitlab.com/Anson-Projects/projects.git synced 2025-09-19 12:02:38 +00:00

8 Commits

Author SHA1 Message Date
85adfdf067 fix: allow publish job to run on feature branch for testing
- Make pages dependency optional to allow testing without pages deployment
- Add rule to allow publish job on ghost-content-extraction branch
- This enables testing the RSS feed parsing and error handling
2025-08-22 23:30:27 -06:00
c9e0264208 fix: improve RSS feed error handling and debugging
- Add comprehensive error handling for RSS feed fetching
- Log detailed error messages and feed content preview
- Handle empty feeds gracefully instead of panicking
- Exit early if no entries found instead of continuing with empty list
2025-08-22 23:29:57 -06:00
d3966eaf53 fix: remove unused slug field to eliminate warning 2025-08-22 11:23:26 -06:00
21ad5cb862 feat: restore ghost profile functionality for clean content extraction
- Restore Quarto ghost profiles in _quarto.yml for dual content rendering
- Restore ghost-iframe.css with clean styling for Ghost content
- Restore GitLab CI dual build: main site + ghost-content optimized version
- Restore extract_article_content() function in Rust for clean HTML extraction
- Update README to document the ghost profiles feature and how it works

This is the core feature of the MR: generating clean HTML content for Ghost
instead of using iframes by building a ghost-optimized version of the site.
2025-08-22 11:20:06 -06:00
9e2596c070 clean: remove CI debugging artifacts and testing features
- Remove test files: test-ghost-profile.md, test-local-deployment.sh, validate-ghost-extraction.sh, AGENTS.md
- Restore .gitlab-ci.yml to original state without debugging changes
- Restore _quarto.yml to original format without ghost profiles
- Remove ghost-iframe.css styling file
- Restore ghost-upload/.gitlab-ci.yml to original state without force-update job
- Simplify Rust code by removing force update functionality and content extraction
- Restore README.md to original state

Keeps core bug fixes: fixed get_slug() and proper Ghost API duplicate checking
2025-08-22 11:16:14 -06:00
f93746e2c0 remove non-functional cache for self-hosted runners 2025-08-22 11:09:38 -06:00
ae1be54f8f fix: remove trailing slash from slugs to fix Ghost API lookup
- Strip trailing slashes from slugs in get_slug() function
- This prevents double slashes in the Ghost API URL which was causing
  get_existing_post_id() to fail and create duplicate posts
2025-08-22 11:01:38 -06:00
e479c96e44 fix: prevent duplicate posts by using Ghost API instead of public URL check
- Remove unreliable check_if_post_exists function that checked public URLs
- Replace with get_existing_post_id which properly queries Ghost's Admin API
- This prevents duplicate posts when public URLs are temporarily unavailable
2025-08-22 10:49:38 -06:00
8 changed files with 88 additions and 386 deletions

View File

@@ -1,15 +1,10 @@
stages:
- build
- deploy
build: build:
stage: build stage: build
image: image:
name: gcr.io/kaniko-project/executor:v1.23.2-debug name: gcr.io/kaniko-project/executor:v1.23.2-debug
entrypoint: [""] entrypoint: [""]
script: script:
- > - /kaniko/executor
/kaniko/executor
--context "${CI_PROJECT_DIR}" --context "${CI_PROJECT_DIR}"
--dockerfile "${CI_PROJECT_DIR}/Dockerfile" --dockerfile "${CI_PROJECT_DIR}/Dockerfile"
--destination "${CI_REGISTRY_IMAGE}:${CI_COMMIT_BRANCH}" --destination "${CI_REGISTRY_IMAGE}:${CI_COMMIT_BRANCH}"
@@ -27,7 +22,7 @@ staging:
paths: paths:
- public - public
deploy: pages:
stage: deploy stage: deploy
script: script:
- echo "Publishing site..." - echo "Publishing site..."
@@ -36,35 +31,6 @@ deploy:
artifacts: artifacts:
paths: paths:
- public - public
# Branch preview deployment (for testing)
preview:
stage: deploy
script:
- echo "Deploying branch preview..."
- echo "Preview available at preview URL"
needs:
- job: staging
optional: true
artifacts:
paths:
- public
environment:
name: preview/$CI_COMMIT_REF_SLUG
url: https://${CI_PROJECT_PATH_SLUG}-${CI_COMMIT_REF_SLUG}.gitlab.io
rules:
- if: "$CI_COMMIT_BRANCH != $CI_DEFAULT_BRANCH"
# GitLab Pages deployment (only on main branch)
pages:
stage: deploy
script:
- echo "Publishing to GitLab Pages..."
needs:
- deploy
artifacts:
paths:
- public
rules: rules:
- if: "$CI_COMMIT_BRANCH == $CI_DEFAULT_BRANCH" - if: "$CI_COMMIT_BRANCH == $CI_DEFAULT_BRANCH"

View File

@@ -1,46 +0,0 @@
# Repository Guidelines
## Project Structure & Module Organization
- `ghost-upload/`: Rust automation for Ghost CMS publishing.
- `posts/`: Quarto posts with Julia/Python code per post directory.
- `public/`: Quarto build output (generated by `quarto render`).
- Root: Quarto config (`_quarto.yml`), shared assets, CI/CD, docs.
## Build, Test, and Development Commands
- Rust (`ghost-upload/`):
- Build: `cd ghost-upload && cargo build`
- Run: `cd ghost-upload && cargo run`
- Test: `cd ghost-upload && cargo test` (single: `cargo test <test_name>`)
- Lint: `cd ghost-upload && cargo clippy`
- Format: `cd ghost-upload && cargo fmt`
- Julia (root or `posts/*/`):
- Packages: `julia -e "using Pkg; Pkg.instantiate()"`
- Precompile: `julia -e "using Pkg; Pkg.precompile()"`
- Run notebook/script: `julia <filename>.jl`
- Quarto (docs/site):
- Build site: `quarto render --to html --output-dir public`
- Preview: `quarto preview`
- Check: `quarto check`
- Docker: `docker build -t projects .` then `docker run projects`
## Coding Style & Naming Conventions
- Rust: `cargo fmt`; fix all `cargo clippy` warnings. Use `?` over `unwrap()`. Imports: std → external → local. Naming: snake_case (fn/vars), PascalCase (types). Public docs with `///`.
- Julia: 4-space indent; spaces around operators; group `using` at top; snake_case; prefer pipelines `|>` for DataFrames; handle expected errors with try-catch.
- Quarto: Include title/date in YAML; set `echo: false`, `warning: false` for clean outputs; descriptive figure captions and alt text.
## Testing Guidelines
- Rust: Unit tests for core logic; add integration tests for API calls. Run with `cargo test`. Organize tests near code or in `tests/`.
- Julia: Validate transformations and plots visually; keep scripts deterministic.
- Quarto: Manually review rendered HTML for links, figures, and warnings.
## Commit & Pull Request Guidelines
- Commits: Use clear, conventional messages (e.g., `feat:`, `fix:`, `docs:`). Scope small and focused.
- PRs: Provide description, linked issues, steps to validate (commands), and screenshots of rendered docs when relevant.
## Security & Configuration
- Environment variables: `kagi_api_key`, `admin_api_key`. Export locally (e.g., `export admin_api_key=...`); never commit secrets.
- Dependencies: Keep minimal and up-to-date. Prefer configuration via env vars over hardcoded values.
## CI/CD & Deployment
- GitLab CI builds Docker, renders Quarto to static hosting; Rust runs separately for content sync. Avoid pipeline changes unless necessary; include rationale in PRs if modified.

View File

@@ -1,8 +1,3 @@
cache:
paths:
- ./ghost-upload/target/
- ./ghost-upload/cargo/
publish: publish:
stage: deploy stage: deploy
image: rust:latest image: rust:latest
@@ -10,26 +5,8 @@ publish:
- cd ./ghost-upload - cd ./ghost-upload
- cargo run - cargo run
needs: needs:
- job: deploy - job: pages
optional: true
- job: staging
optional: true
# Manual trigger to force update all Ghost posts
force-update-ghost:
stage: deploy
image: rust:latest
script:
- echo "🔄 Force updating all Ghost posts..."
- cd ./ghost-upload
- FORCE_UPDATE=true cargo run
needs:
- job: deploy
optional: true
- job: staging
optional: true optional: true
rules: rules:
- when: manual - if: "$CI_COMMIT_BRANCH == $CI_DEFAULT_BRANCH"
allow_failure: false - if: "$CI_COMMIT_BRANCH == 'ghost-content-extraction'" # Allow testing on this branch
variables:
FORCE_UPDATE: "true"

View File

@@ -4,36 +4,22 @@ This tool synchronizes posts from https://projects.ansonbiggs.com to the Ghost b
## Features ## Features
- **Automatic sync**: Only uploads new posts by default - **Clean content extraction**: Uses Quarto ghost profile to generate clean HTML instead of iframes
- **Content extraction**: Fetches clean HTML content instead of using iframes - **Duplicate prevention**: Checks Ghost Admin API to avoid creating duplicate posts
- **AI summaries**: Uses Kagi Summarizer for post summaries - **AI summaries**: Uses Kagi Summarizer for post summaries
- **Force update**: Manual trigger to update all existing posts - **Dual content rendering**: GitLab CI builds both main site and ghost-optimized versions
## Usage ## How It Works
### Normal Mode (Default) 1. **Dual Build Process**: GitLab CI builds the site twice:
```bash - Main site → `public/` (normal theme with navigation)
cargo run - Ghost content → `public/ghost-content/` (minimal theme for content extraction)
```
Only processes new posts that don't exist on the Ghost blog.
### Force Update Mode 2. **Content Extraction**: Rust tool fetches clean HTML from the ghost-content version instead of using iframes
```bash
FORCE_UPDATE=true cargo run
```
Updates ALL posts, including existing ones. Useful for:
- Updating content after changes
- Refreshing summaries
- Applying new styling/formatting
## CI/CD Integration 3. **Duplicate Detection**: Uses Ghost Admin API to check for existing posts by slug
The GitLab CI pipeline includes:
- **Automatic sync**: Runs after each deployment
- **Manual force update**: Available as a manual trigger in GitLab UI
## Environment Variables ## Environment Variables
- `admin_api_key`: Ghost Admin API key (required) - `admin_api_key`: Ghost Admin API key (required)
- `kagi_api_key`: Kagi Summarizer API key (required) - `kagi_api_key`: Kagi Summarizer API key (required)
- `FORCE_UPDATE`: Set to "true" to update all posts (optional)

View File

@@ -143,7 +143,7 @@ impl Post {
} }
fn get_slug(link: &str) -> String { fn get_slug(link: &str) -> String {
link.split_once("/posts/").unwrap().1.to_string() link.split_once("/posts/").unwrap().1.trim_end_matches('/').to_string()
} }
async fn extract_article_content(original_link: &str) -> String { async fn extract_article_content(original_link: &str) -> String {
@@ -191,16 +191,6 @@ async fn extract_article_content(original_link: &str) -> String {
} }
} }
async fn check_if_post_exists(entry: &Entry) -> bool {
let posts_url = "https://notes.ansonbiggs.com/";
let link = entry.links.first().unwrap().href.as_str();
let slug = get_slug(link);
match reqwest::get(format!("{}{}", posts_url, slug)).await {
Ok(response) => response.status().is_success(),
Err(_) => false,
}
}
#[derive(Deserialize, Debug)] #[derive(Deserialize, Debug)]
struct GhostPostsResponse { struct GhostPostsResponse {
@@ -210,7 +200,6 @@ struct GhostPostsResponse {
#[derive(Deserialize, Debug)] #[derive(Deserialize, Debug)]
struct GhostPost { struct GhostPost {
id: String, id: String,
slug: String,
} }
async fn get_existing_post_id(slug: &str, token: &str) -> Option<String> { async fn get_existing_post_id(slug: &str, token: &str) -> Option<String> {
@@ -239,10 +228,47 @@ async fn get_existing_post_id(slug: &str, token: &str) -> Option<String> {
} }
async fn fetch_feed(url: &str) -> Vec<Entry> { async fn fetch_feed(url: &str) -> Vec<Entry> {
let content = reqwest::get(url).await.unwrap().text().await.unwrap(); println!("Fetching RSS feed from: {}", url);
let feed = parser::parse(content.as_bytes()).unwrap(); let response = reqwest::get(url).await;
let response = match response {
Ok(resp) => resp,
Err(e) => {
println!("Failed to fetch RSS feed: {}", e);
return vec![];
}
};
if !response.status().is_success() {
println!("RSS feed request failed with status: {}", response.status());
return vec![];
}
let content = match response.text().await {
Ok(text) => text,
Err(e) => {
println!("Failed to read RSS feed content: {}", e);
return vec![];
}
};
if content.trim().is_empty() {
println!("RSS feed content is empty");
return vec![];
}
println!("RSS feed content preview: {}", &content[..content.len().min(200)]);
let feed = match parser::parse(content.as_bytes()) {
Ok(f) => f,
Err(e) => {
println!("Failed to parse RSS feed: {:?}", e);
println!("Feed content starts with: {}", &content[..content.len().min(500)]);
return vec![];
}
};
println!("Successfully parsed RSS feed with {} entries", feed.entries.len());
feed.entries feed.entries
} }
@@ -305,15 +331,7 @@ async fn main() {
let ghost_api_url = "https://notes.ansonbiggs.com/ghost/api/v3/admin/posts/?source=html"; let ghost_api_url = "https://notes.ansonbiggs.com/ghost/api/v3/admin/posts/?source=html";
let ghost_admin_api_key = env::var("admin_api_key").unwrap(); let ghost_admin_api_key = env::var("admin_api_key").unwrap();
// Check if force update is enabled
let force_update = env::var("FORCE_UPDATE").unwrap_or_default() == "true";
if force_update {
println!("🔄 FORCE UPDATE MODE ENABLED");
println!(" This will update ALL posts, including existing ones.");
} else {
println!("📝 NORMAL MODE - Only publishing new posts");
}
let feed = "https://projects.ansonbiggs.com/index.xml"; let feed = "https://projects.ansonbiggs.com/index.xml";
@@ -348,26 +366,30 @@ async fn main() {
// Prepare the post data // Prepare the post data
let entries = fetch_feed(feed).await; let entries = fetch_feed(feed).await;
let filtered_entries: Vec<Entry> = if force_update { if entries.is_empty() {
println!("🔄 Force update enabled - processing all {} posts", entries.len()); println!("No entries found in RSS feed or feed parsing failed. Exiting.");
entries return;
} else { }
println!("Processing {} entries from RSS feed", entries.len());
let post_exists_futures = entries.into_iter().map(|entry| { let post_exists_futures = entries.into_iter().map(|entry| {
let entry_clone = entry.clone(); let entry_clone = entry.clone();
async move { (entry_clone, check_if_post_exists(&entry).await) } let token_clone = token.clone();
async move {
let link = entry.links.first().unwrap().href.as_str();
let slug = get_slug(link);
(entry_clone, get_existing_post_id(&slug, &token_clone).await.is_some())
}
}); });
let post_exists_results = join_all(post_exists_futures).await; let post_exists_results = join_all(post_exists_futures).await;
let new_entries: Vec<Entry> = post_exists_results let filtered_entries: Vec<Entry> = post_exists_results
.into_iter() .into_iter()
.filter_map(|(entry, exists)| if !exists { Some(entry) } else { None }) .filter_map(|(entry, exists)| if !exists { Some(entry) } else { None })
.collect(); .collect();
println!("📝 Found {} new posts to publish", new_entries.len());
new_entries
};
if filtered_entries.is_empty() { if filtered_entries.is_empty() {
println!("Nothing to post."); println!("Nothing to post.");
return; return;
@@ -382,46 +404,21 @@ async fn main() {
posts: vec![post.clone()], posts: vec![post.clone()],
}; };
// Check if this is an update (for force_update mode) let response = client
let (method, url) = if force_update { .post(ghost_api_url)
if let Some(existing_id) = get_existing_post_id(&post.slug, &token).await {
println!("🔄 Updating existing post: {}", post.title);
("PUT", format!("https://notes.ansonbiggs.com/ghost/api/v3/admin/posts/{}/", existing_id))
} else {
println!("📝 Creating new post: {}", post.title);
("POST", ghost_api_url.to_string())
}
} else {
println!("📝 Creating new post: {}", post.title);
("POST", ghost_api_url.to_string())
};
let response = match method {
"PUT" => client
.put(&url)
.header("Authorization", format!("Ghost {}", token)) .header("Authorization", format!("Ghost {}", token))
.json(&post_payload) .json(&post_payload)
.send() .send()
.await .await
.expect("Request failed"), .expect("Request failed");
_ => client
.post(&url)
.header("Authorization", format!("Ghost {}", token))
.json(&post_payload)
.send()
.await
.expect("Request failed"),
};
// Check the response // Check the response
if response.status().is_success() { if response.status().is_success() {
let action = if method == "PUT" { "updated" } else { "published" }; println!("Post {} published successfully.", post.title);
println!("✅ Post '{}' {} successfully.", post.title, action);
} else { } else {
let action = if method == "PUT" { "update" } else { "publish" };
println!( println!(
"Failed to {} post '{}'.\n\tStatus: {}\n\tResponse: {:?}", "Failed to publish post {}.\n\tResp: {:?}",
action, &post.title, response.status(), response.text().await.unwrap_or_default() &post.title, response
); );
} }
} }

View File

@@ -1,34 +0,0 @@
# Test Ghost Profile Output
This is a test document to validate our ghost profile setup.
## Content Structure
The ghost profile should:
- Remove navigation elements
- Use minimal styling from ghost-iframe.css
- Maintain clean article layout
- Remove table of contents
## Code Example
```julia
println("Hello from Julia!")
x = 1 + 1
```
## Regular Content
This is just some regular markdown content to see how it renders in the ghost profile.
- List item 1
- List item 2
- List item 3
**Bold text** and *italic text* should render properly.
[Link to main site](https://projects.ansonbiggs.com)
## Summary
If you can see clean, minimal styling without navigation, the ghost profile is working correctly.

View File

@@ -1,55 +0,0 @@
#!/bin/bash
echo "🧪 Testing local deployment simulation..."
# Create test directories
mkdir -p test-output/main
mkdir -p test-output/ghost-content
echo "📁 Simulating dual-output build..."
# Test 1: Check if ghost profile exists
if grep -q "ghost:" _quarto.yml; then
echo "✅ Ghost profile configuration found"
else
echo "❌ Ghost profile not found"
exit 1
fi
# Test 2: Simulate content extraction
echo "🔍 Testing content extraction logic..."
cd ghost-upload
# Test with sample URL (without actually hitting network)
echo "📝 Testing Rust compilation and basic logic..."
if cargo check --quiet; then
echo "✅ Rust code compiles successfully"
else
echo "❌ Rust compilation failed"
exit 1
fi
cd ..
# Test 3: Check if CI would work
echo "🔧 Validating CI configuration..."
if ./validate-ghost-extraction.sh > /dev/null 2>&1; then
echo "✅ CI validation passed"
else
echo "❌ CI validation failed"
exit 1
fi
echo ""
echo "🎉 Local testing complete!"
echo ""
echo "📋 What happens in CI:"
echo " 1. Builds main site → public/"
echo " 2. Builds ghost content → public/ghost-content/"
echo " 3. Rust extracts from ghost-content URLs"
echo " 4. Posts to Ghost blog with clean HTML"
echo ""
echo "🚀 Ready for branch testing in GitLab CI!"
echo " • Download artifacts to see both outputs"
echo " • Use manual trigger to test force-update"
echo " • Check ghost-content/ folder structure"

View File

@@ -1,89 +0,0 @@
#!/bin/bash
# Simple validation script for ghost content extraction
echo "🔍 Validating ghost profile implementation..."
# Check if required files exist
echo "📁 Checking required files..."
if [ ! -f "_quarto.yml" ]; then
echo "❌ _quarto.yml not found"
exit 1
fi
if [ ! -f "ghost-iframe.css" ]; then
echo "❌ ghost-iframe.css not found"
exit 1
fi
if [ ! -f "ghost-upload/src/main.rs" ]; then
echo "❌ Rust source not found"
exit 1
fi
echo "✅ All required files present"
# Check if ghost profile is defined in _quarto.yml
echo "📋 Checking ghost profile configuration..."
if grep -q "ghost:" _quarto.yml; then
echo "✅ Ghost profile found in _quarto.yml"
else
echo "❌ Ghost profile not found in _quarto.yml"
exit 1
fi
# Check if GitLab CI builds both versions
echo "🔧 Checking GitLab CI configuration..."
if grep -q "ghost-content" .gitlab-ci.yml; then
echo "✅ GitLab CI configured for dual output"
else
echo "❌ GitLab CI not configured for ghost-content"
exit 1
fi
# Check if Rust code has extract_article_content function
echo "🦀 Checking Rust implementation..."
if grep -q "extract_article_content" ghost-upload/src/main.rs; then
echo "✅ Content extraction function found"
else
echo "❌ Content extraction function not found"
exit 1
fi
# Check if force update functionality is available
if grep -q "FORCE_UPDATE" ghost-upload/src/main.rs; then
echo "✅ Force update functionality found"
else
echo "❌ Force update functionality not found"
exit 1
fi
# Check if manual CI job is configured
if grep -q "force-update-ghost" ghost-upload/.gitlab-ci.yml; then
echo "✅ Manual force update CI job found"
else
echo "❌ Manual force update CI job not found"
exit 1
fi
# Verify Rust code compiles
echo "🛠️ Building Rust code..."
cd ghost-upload
if cargo check --quiet; then
echo "✅ Rust code compiles successfully"
else
echo "❌ Rust compilation failed"
exit 1
fi
cd ..
echo ""
echo "🎉 All validations passed!"
echo "📋 Summary of changes:"
echo " • Quarto profiles for dual-output rendering"
echo " • Ghost-optimized CSS styling"
echo " • GitLab CI builds both main site and ghost-content"
echo " • Rust extracts HTML content instead of using iframes"
echo " • Force update mode to refresh existing posts"
echo " • Manual CI trigger for content updates"
echo ""
echo "🚀 Ready for testing in CI/CD pipeline!"