Advanced Parsing

Parsing is the core of the AirDoo module. It transforms raw Airbnb emails into structured data usable by Odoo.

Parser Architecture

graph TD A[Raw Email] --> B[HTML/Text Extraction] B --> C[Cleaning & Normalization] C --> D[Format Identification] D --> E{Specific Parsing} E -->|Format A| F[French Parser] E -->|Format B| G[English Parser] E -->|Format C| H[Portuguese Parser] F --> I[Data Validation] G --> I H --> I I --> J[Enrichment] J --> K[Structured Data]

Supported Formats

Languages

  • French: Standard France/Belgium/Switzerland format
  • English: International format
  • Portuguese: Brazil/Portugal format
  • Spanish: Spain/Latin America format
  • German: Germany/Austria/Switzerland format
  • Italian: Italy format

Email Types

  1. Booking confirmation (primary)
  2. Booking modification
  3. Booking cancellation
  4. Host message (optional)

Structure of Extracted Data

Required Data

{
    "confirmation_code": "HM2YEYHZXC",  # Unique Airbnb code
    "checkin_date": "2026-02-01",
    "checkout_date": "2026-02-07",
    "accommodation_name": "Chalet du Frenalay",
    "guest_name": "John Smith",
    "guest_email": "john@example.com",
    "total_amount": 980.00,
    "nights": 6,
    "status": "confirmed"
}

Optional Data

{
    "guest_phone": "+44712345678",
    "guest_composition": {
        "adults": 2,
        "children": 1,
        "babies": 0,
        "children_ages": [5]  # If available
    },
    "guest_notes": "Arriving around 4pm",
    "breakdown": {
        "nightly_rate": 150.00,
        "cleaning_fee": 80.00,
        "airbnb_fee": 147.00,
        "taxes": 49.00
    },
    "currency": "EUR",
    "source_language": "en"
}

Parsing Algorithms

1. Confirmation Code Extraction

def extract_confirmation_code(text):
    # Pattern: 10 uppercase alphanumeric characters
    pattern = r'\b[A-Z0-9]{10}\b'
    match = re.search(pattern, text)
    return match.group(0) if match else None

2. Date Extraction

def extract_dates(text, language='en'):
    date_patterns = {
        'fr': r'(\d{1,2}\s+\w+\s+\d{4})',
        'en': r'(\w+\s+\d{1,2},\s+\d{4})',
        'pt': r'(\d{1,2}\s+de\s+\w+\s+de\s+\d{4})'
    }
    # Extraction and conversion logic

3. Price Extraction

def extract_price(text, currency='EUR'):
    # Supports different formats: 980,00€, €980.00, 980 EUR
    patterns = [
        r'(\d+[.,]\d+)\s*' + re.escape(currency),
        re.escape(currency) + r'\s*(\d+[.,]\d+)',
    ]

Data Validation

Validation Rules

VALIDATION_RULES = {
    'confirmation_code': {
        'required': True,
        'pattern': r'^[A-Z0-9]{10}$',
        'message': 'Invalid confirmation code'
    },
    'checkin_date': {
        'required': True,
        'type': 'date',
        'future': True,
        'message': 'Invalid check-in date'
    },
    'total_amount': {
        'required': True,
        'type': 'float',
        'min': 0.01,
        'message': 'Invalid total amount'
    },
    'guest_email': {
        'required': True,
        'type': 'email',
        'message': 'Invalid email'
    }
}

Validation Process

  1. Syntactic validation: Format and type
  2. Semantic validation: Data consistency
  3. Business validation: AirDoo-specific rules
  4. Consistency validation: Dates, prices, etc.

Error Handling

Error Types

  1. Format errors: Unrecognized email
  2. Data errors: Missing or invalid data
  3. Consistency errors: Internal inconsistencies
  4. System errors: Technical issues

Error Logging

class ParsingErrorLog(models.Model):
    _name = 'airdoo.parsing_error'

    email_id = fields.Char('Email ID')
    error_type = fields.Selection([
        ('format', 'Unsupported Format'),
        ('data', 'Invalid Data'),
        ('consistency', 'Inconsistency'),
        ('system', 'System Error')
    ])
    error_message = fields.Text('Error Message')
    raw_content = fields.Text('Raw Content')
    parsed_data = fields.Text('Parsed Data')
    resolution_status = fields.Selection([
        ('pending', 'Pending'),
        ('resolved', 'Resolved'),
        ('ignored', 'Ignored')
    ])

Parsing Customization

Custom Parsing Rules

class CustomParsingRule(models.Model):
    _name = 'airdoo.parsing_rule'

    name = fields.Char('Rule Name')
    pattern = fields.Text('Regex Pattern')
    field_to_extract = fields.Char('Field to Extract')
    transformation = fields.Text('Transformation')
    priority = fields.Integer('Priority')
    active = fields.Boolean('Active')

Performance and Optimization

Parsing Cache

class ParsingCache:
    def __init__(self):
        self.cache = {}
        self.max_size = 1000

    def get(self, email_hash):
        return self.cache.get(email_hash)

    def set(self, email_hash, parsed_data):
        if len(self.cache) >= self.max_size:
            # LRU eviction
            self.cache.pop(next(iter(self.cache)))
        self.cache[email_hash] = parsed_data

Unit Tests

Test Structure

class TestAirbnbParser(unittest.TestCase):

    def test_english_confirmation_email(self):
        email_content = load_fixture('english_confirmation.eml')
        result = parse_email(email_content, language='en')

        self.assertIsNotNone(result)
        self.assertEqual(result['confirmation_code'], 'HM2YEYHZXC')
        self.assertEqual(result['nights'], 6)
        self.assertEqual(result['total_amount'], 980.00)

    def test_english_modification_email(self):
        email_content = load_fixture('english_modification.eml')
        result = parse_email(email_content, language='en')

        self.assertIsNotNone(result)
        self.assertEqual(result['status'], 'modified')

    def test_invalid_email_format(self):
        email_content = "This is not an Airbnb email"
        result = parse_email(email_content)

        self.assertIsNone(result)

Maintenance and Evolution

Updating Parsers

  1. Monitoring: Success rate by format
  2. Detection: New unsupported formats
  3. Adaptation: Update existing rules
  4. Testing: Validate before deployment

Best Practices

1. Robustness

  • Handle edge cases: Partial emails, hybrid formats
  • Strict validation: Reject doubtful data
  • Smart fallback: Alternative extraction attempts

2. Performance

  • Smart cache: Avoid unnecessary re-parsing
  • Lazy parsing: Extract only what is needed
  • Regex optimization: Compiled and efficient patterns

3. Maintainability

  • Modular code: Separate parsers by language/format
  • Tests: Maximum coverage of cases

← Back: User Guide | Next: Multi-Accommodations →